Courant  Computer  Science  Report  #  7 

August  1975 


Directions  in  Artificial  Intelligence; 
Natural  Language  Processing 


Ralph  Grishman,  Editor 


Courant  Institute  of 
Mathematical  Sciences 

Computer  Science  Department 

New  York  University 

Report  No.  NSO-7  prepared  under  Contract 
No.  N00014-67A-0467-0032 
with  the  Office  of  Naval  Research 


NEW  YfffK  UWIVFr^SfTV 

cmfR AWT  \nj^^  r I  w ~  g . l  i br a  ; 

HI   Mmmt  at     Mmw  YorK.  N.Y.  1« 


COURANT  COMPUTER  SCIENCE  PUBLICATIONS 


Pri  ce 


COURANT  COMPUTER  SCIENCE  NOTES 


Programming  Languages  and  Their  Compilers, 

J.  Cocke  &  J.  T.  Schwartz,  2nd  Revised  Version, 

April    1970,     iii+767   pp.  $19.25 

On    Programming:    An    Interim    Report    on    the    SETL    Project. 
Part    I :      General i  ti  es . 

Part    II:    The    SETL    Language    and    Examples    of    Its    Use. 
(Parts    I    and    II    are    consolidated    in    this    volume.) 
J.    T.    Schwartz,    Revised    June    2975,    xii+675   pp.  $17.25 

A   SETLB    Primer.       H.    Mullish    &    M.    Goldstein,    1973,    v+201   pp.  5.25 

Combinatorial    Algorithms^,         E.    G.    Whitehead,    Jr.,    1973,    vi+l04p.  2 .75 

COURANT    COMPUTER    SCIENCE    REPORTS 

No.  1   ASL:   A  Proposed  Variant  of  SETL 
Henry  Warren,  Jr.,  1973,     326   pp. 

No.  2   A  Metalanguage  for  Expressing  Grammatical  Restrictions 
i_n  Nodal  Spans  Parsing  of  Natural  Language , 
Jerry  R.  Hobbs,  1974,    266    pp. 

No.  3   Type  Determination  for  Very  High  Level  Languages 
Aaron  M.  Tenenbaum,  1974,     171   pp. 

No.  4   A  Comprehensive  Survey  of  Parsing  Algorithms  for 
Programming  Languages,  Phillip  Owens,  Forthcoming. 

No.  5    Investigations  in  the  Theory  of  Descriptive  Complexity, 
William  L.  Gewirtz,  1974,    60   pp. 

No.  6   Operating  System  Specification  Using  ^qt^   High  Level 
Dictions,  Peter  Markstein,  197S,    152   pp. 

No.  7    Directions  in  Artificial  Intelligence:  Natural  Language 
Processi  ng ,  Ed.  Ralph  Grishman,  1975,    107  pp. 

No.  8   A  Survey  of  Syntactic  Analysis  Procedures  for  Natural 
Language ,  Ralph  Grishman,  1975,    94   pp. 


A  catalog  of  SETL  Newsletters  and  other  SETL-related 
material  is  also  available.  Courant  Computer  Science 
Reports  are  available  upon  request.  Prepayment  is 
required  for  Courant  Computer  Science  Notes .  Please 
address  all  communications  to 

COURANT  INSTITUTE  OF  MATHEMATICAL  SCIENCES 
251  Mercer  Street 
New  York,  N.  Y.   1001 2 


COURANT  INSTITUTE  OF  MATHEMATICAL  SCIENCES 


Computer  Science  NSO-7 

Proceedings  of  a  Symposium  on 

DIRECTIONS  IN  ARTIFICIAL  INTELLIGENCE 
NATURAL  LANGUAGE  PROCESSING 

Edited  by 
Ralph  Grishman 


Report  No.  NSO-7  prepared  under 
Contract  No.  N00014-67A-0467-0032 
with  the  Office  of  Naval  Research 


TABLE    OF    CONTE.^ITS 

Page 


Introduction v 

Modeling  Dictionary  Data 1 

Robert  Simmons  and  Robert  Amsler 
University  of  Texas  at  Austin 

Computerized  Discovery  of  Semantic  Word  Classes 
in  Scientific  Fields  27 

Naomi  Sager 

New  York  University 

The  OWL  Concept  Hierarchy 49 

William  Martin 

Massachusetts  Institute  of  Technology 

Design  of  the  Underlying  Structure  for  a  Data  Base 
Retrieval  System  60 

Stanley  Petrick 
IBM  Corporation 


Discussion 


94 


-111- 


INTRODUCTION 

The  Computer  Science  Department  of  New  York  University,  under 
contract    to  the  Office  of  Naval  Research,  has  endeavored  to 
examine  and  review  recent  progress  in  selected  areas  of  artifi- 
cial intelligence.   One  phase  of  this  effort  has  been  the 
preparation  of  reports  which  present,  in  a  unified  framework, 
the  variety  of  approaches  which  have  been  taken  to  particular 
artificial  intelligence  problems.  Another   phase  has  involved 
organizing  symposia  among  leading  workers  in  the  field  to  discuss 
current  research  and  the  prospects  for  the  immediate  future. 

The  first  of  these  symposia  was  held  on  December  6,  1974  at 
the  Courant  Institute  of  Mathematical  Sciences,  New  York  Univer- 
sity. This  symposium  considered  the  problems  of  natural  language 
processing  systems,  and  in  particular  the  problems  of  collecting, 
organizing,  and  using  semantic  information.   To  put  the  research 
which  will  be  discussed  below  in  perspective,  let  me  try  to 
characterize  veiry  briefly  some  of  the  current  systems  with  a 
natural  language  "front  end,"  such  as  those  of  Woods,  Petrick, 
and  Winograd.   These  systems  accept  a  language  with  a  moderately 
rich  syntax  which  is,  however,  still  a  quite  restricted  subset 
of  English.   Most  of  these  natural  language  processors  have  been 
used  as  part  of  simple  question-answering  or  robot-control  systems 
These  applications  provide  relatively  clear  criteria  for  whether 
the  system  has  understood  an  input  or  not.   At  the  same  time, 
these  applications  are  conducive  to  simpler  sentences  than  those 
encountered  in,  say,  scientific  journal  articles.   The  semantics 
underlying  these  systems  are  all  quite  simple:   the  question- 
answering  systems  usually  do  retrieval  or  simple  computation 
from  a  tabular  data  base,  while  the  robot  systems  work  in  a 
very  simple  environment. 

What  seems  reasonable  as  a  next  step  for  natural  language 
systems?   Certainly  not  reading  and  imderstanding  a  novel  --  that 
is  much  too  far  beyond  the  current  state  of  the  art.   We  have 


-V- 


to  find  an  area  with  a  somewhat  more  complicated  but  still  limited 
semantics.   Perhaps  the  semantics  of  a  specialized  scientific  or 
technical  field  would  be  appropriate.   For  example,  one  might 
consider  developing  a  system  to  analyze  a  programming  language 
manual,  an  equipment  repair  manual,  or  some  type  of  specialized 
scientific  report. 

The  syntactic  problems  which  would  be  involved  no  longer  seem 
too  bad.   There  is  general  —  though  not  universal  —  agreement 
on  transformational  grammar  as  a  framework  for  syntactic  analysis, 
even  though  the  actual  parsing  procedures  differ  significantly. 
Among  those  systems  producing  an  explicit  "deep  structure",  there 
is  considerable  agreement  on  its  general  form.   Several  successful 
procedures  have  been  developed,  using  augmented  context-free 
grammars,  unrestricted  phrase-structure  grammars,  and  transforma- 
tional grammars  (for  a  survey  of  these  various  procedures,  see 
Courant  Computer  Science  Report  No.  8) .   The  further  increases  in 
complexity  required  to  handle  the  proposed  applications,  while 
not  small,  are  probably  less  than  an  order  of  magnitude  beyond 
the  most  advanced  present  systems.   At  least  one  current  system 
which  is  restricted  to  syntactic  analysis  (Sager)  is  already 
able  to  handle  most  of  the  constructions  which  would  be  required. 

On  the  semantic  side,  in  contrast,  large  increases  in  complex- 
ity beyond  current  question-answering  systems  will  be  required. 
There  has  been  relatively  little  theory  to  support  the  development 
of  semantic  components,  and  there  is  correspondingly  little 
consensus  on  the  structure  of  these  components.   In  the  most 
successful  current  systems,  the  semantics  has  been  designed 
directly  to  fit  a  particular  application,  and  consequently 
appears  to  be  hard  to  generalize  to  broader  applications. 

One  idea,  which  has  been  around  for  a  long  time  and  is 
currently  receiving  a  good  deal  of  attention  from  computational 
linguists  (particularly  Schank  and  his  followers) ,  is  the 
decomposition  of  semantic  structures  into  a  small  number  of 
primitive  semantic  elements.   This  does  seem  to  be  a  fundamentally 
valuable  idea,  but  anyone   trying  to  use  it  is  faced  with  the 


-VI- 


problem  of  selecting  a  set  of  primitives,  and  so  far  everyone 
has  come  up  with  a  set  of  his  own.   The  trouble  is  that  there 
are  at  present  no  clear  criteria  for  selecting  these  primitives, 
except  perhaps  for  the  vague  one  that  "they  seem  to  work  well 
in  our  system". 

We  invited  five  researchers  currently  active  in  natural 
language  processing  to  discuss  these  and  other  problems  of 
semantic  representation.   (A  portion  of  the  letter  of  invita- 
tion is  given  on  the  next  page.)   These  speakers  were  (in  the 
order  in  which  the  talks  were  given) 

Robert  Simmons  University  of  Texas,  Austin 

Naomi  Sager  New  York  University 

William  Martin  Massachusetts  Institute  of  Tech. 

Stanley  Petrick  International  Business  Machines  Corp. 

William  Woods  Bolt   Beranek  and  Newman 

The  rest  of  this  volume  contains  the        transcripts  of  the 
talks  given  by  all  but  the  last  speaker  (as  edited  by  the 
speakers  and  the  chairman) ,  and  the  general  discussion  among 
speakers  and  audience  which  concluded  the  symposium. 

R.  Grishman 
Courant  Institute 
August  19  75 


-vii- 


From  the  letter  of  invitation: 

Over  the  past  few  years  we  have  seen  two  somewhat  separate  paths 
of  development  in  natural  language  processing,  paths  which  can  be 
described  as  the  broad  and  shallow   and  the  deep  and  narrow. 

The  broad  and  shallow  path  has  been  followed  by  researchers  in 
such  areas  as  machine  translation  and  text  retrieval.   Here, 
typically,  the  area  of  discourse  is  large  and  the  texts  have  little 
explicit  structure  (beyond  the  syntactic  structure  of  the  language) . 
Research  has  been  concerned  largely  with  obtaining  syntactic 
analyses  of  high  quality.   Semantic  information  has  been  rather 
limited,   typically  consisting  of  subclass  assignments  and 
cooccurrence  restrictions  to  eliminate  incorrect  parses.  Despite 
its  limitations,  this  path  has  begun  to  address  itself  to  the 
problems  of  subclass  assignment  for  sizable   vocabularies,  and 
hence  to  the  task  of  explicating  the  structure  of  discourse  in 
a  field  for  language  processing  purposes. 

More  attention  of  late  has  been  focused  on  the  deep  and  narrow 
path  required  in  fact  retrieval  and  command  and  control  applica- 
tions.  Here,  much  more  detailed  semantic  and  pragmatic  information 
is  required.   By  limiting  themselves  to  explicitly  highly 
structured  areas,  such  as  retrieval  oi  tabular  information  or 
manipulations  in  simple,  artificial  worlds,  researchers  have  been 
able  to  assemble  the  requisite  semantic  information.   This 
information  has  generally  been  encoded  as  a  set  of  procedures, 
in  part  a  testimony  to  the  complexity  and  lack  of  uniform  struc- 
ture in  the  semantic  data. 

We  may  reasonably  expect  this  deep  channel  to  be  steadily 
widened  over  the  next  few  years.   We  may  aim,  for  example,  toward 
fact  retrieval  systems  in  larger  and  less  explicitly  structured 
but  still  limited  domains,  such  as  the  contents  of  a  repair  manual, 
programming  language  reference  manual,  or  medical  report.  Unfor- 
tunately, the  current  deep-and-narrow  systems,  while  often 
impressively  successful,  seldom  give  clear  indication  of  how  their 
semantic  information  may  be  expanded  in  an  orderly  fashion  to  a 
larger  domain.   Perhaps  the  broad  and-shallow  research  may 
provide  valuable  input  on  how  to  structure  larger  fields. 

These  are  the  problems  we  would  like  to  address  at  this  seminar. 
We  are  particularly  interested  in  any  developing  methodology  for 
the  collection  of  semantic  information,  which  has  been  assembled 
until  now  in  a  rather  ad  hoc  manner.  We  would  ask  each  speaker 
to  spend  some  time  describing  his  past  experiences  and  some  time 
discussing  how  he  might  approach  the  problems  described  above 
in  a  systematic  fashion. 


-viii- 


MODELING  DICTIONARY  DATA 

Robert  F.  Simmons  and  Robert  A.  Amsler 
Department  of  Computer  Sciences,  The  University  of  Texas  at  Austin 

Abstract 

Forms  and  structures  of  definitions  in  Merriam-Webster ' s 
dictionaries  are  presented  to  derive  models  of  sense  selection 
contexts,  sense  meanings,  and  hierarchical  relations  among  verbs 
The  sense  meaning  model  is  presented  as  a  case-role  semantics 
accompanied  by  time-ordered  sets  of  assertions  marked  for  truth 
value.   Systematic  extraction  of  these  types  of  models  from 
dictionary  data  is  argued  to  be  an  encouraging  line  of  research. 


I  think  we  are  all  agreed  that  since  about  1968  we  have  had 
some  very  interesting  natural  language  processing  systems  that 
have  very  limited  but  powerful   capabilities  within  those 
limiations.   The  critical  thing  that  has  been  going  on  in  these 
is  to  choose  a  microworld  (as   Ralph  Grishman  suggested  in  the 
introduction)  where  there  is  very  little  that  in  fact  can  be  said, 
and  then  you  are  able  to  manage  the  semantics. 

Now  the  situation  of  trying  to  understand  text  is  quite 
different.   You  know  examples  of  those  microworlds  are  the 
Woods  Airline  Guide  [2],  and  the  lunar  rocks  data  base  [3],  the 
Winograd  hand-and-blocks  [4],  and  the  Heidorn  trucks  at  a  gas 
depot  [5],  and  Schank's   John  and  Mary  --  with  John's  attitude 
toward  Mary.   And  there  are  others.   I  think  that  in  text  proces- 
sing we  are  nowhere  near  in  such  good  shape.   I  have  seen 
interesting  experiments  by  Sager  about  3  years  ago  [6]  in  terms 
of  getting  some  semantic  structure  out  of  documents  and  I  recall 
early  work  by  Harris  [7],  to  get  kernel  sentences  out  of  documents 
And  most  recently  we  had  some  work  by  Charniak  on  children's 
stories  [8]  in  which  the  great  contribution  was,  I  think,  to  show 
how  incredibly  difficult  it  is  to  solve  the  problem  of  reference 


* 
Partially  supported  by  NSF  Grant  GJ509E 


-1- 


even  there,  in  order  to  find  the  antecedent  for  a  pronoun,  'it'. 
There  are  a  couple  of  examples  where  you  have  to  have  the  full 
strength  of  some  kind  of  a  problem  solving  /  theorem  prover  system. 

Most  recently,  a  paper  that  hasn't  gotten  around  very  much, 
is  a  very  ambitious  effort  by  Roger  Schank  [9]  on  setting  up 
discourse  structure  for  paragraphs.   That  paper  will  probably 
be  circulating  pretty  widely  by  Spring.   In  that,  he  very 
ambitiously  takes  the  folk  tale  from  Eskimo  literature  --  several 
paragraph  stories  —  about  5  I  think  —  and  he  makes  a  causal 
chain  of  the  relationships  among  the  deep  conceptual  dependency 
structures  that  he  derives.   It  is  easily  criticizable.  There  are 
many  things  where  each  of  us  would  differ  on  whether  that  is  the 
path  or  exactly  what  is  going  on.   But  I  admire  it  very  much 
because  it  is  a  very  ambitious  attempt  to  deal  with  a  fairly  large 
piece  of  text.   The  point  is  we  don't  have  models  for  dealing 
with  large  amounts  of  text,  or  even  with  several  paragraphs  of 
text.   We  don't  really  know  how  to  handle  the  semantics  of  text 
discourse . 

The  basic  problems  are:   How  do  you  use  sentences  to  understand 
following  sentences?   How  do  you  solve  problems  of  reference?   And 
what  is  an  adequate  lexical  structure  to  do  this?   What  is  an 
adequate  semantic  structure  to  represent  the  resulting  meanings? 
That  whole  family  of  questions  needs  to  be  answered. 

We  are  very  much  concerned  in  my  group  at  Texas  with  studying 
all  these  things.   We  have  one  student  (we  don't  have  a  big 
budget  for  this)  but  one  student  is  working  on  discourse  analysis: 
a  dissertation  by  Hendrix  [10]  is  about  finished  on  modelling 
techniques  --  an  approach  toward  representing  meanings  as  set- 
theoretic  expressions.   It  takes  semantic  nets  right  down  to  the 
abstract  algebraic  description  of  what  the  meaning  of    sentences 
is.   And  what  I  will  talk  about  today  is  Robert  Amsler's  studies 
of  the  Merriam-Webster  dictionaries. 

With  that  introduction,  let  me  shift  to  the  business  of  compu- 
tational lexicology.  I  think  'lexicology'  is  a  fine  enough  word. 
It  is  in  contrast  to  lexicography  —  studying  how  to  put  together 


-2- 


meanings  to  make  a  dictionary  is  lexicography.   Lexicology  is, 
as  best  I  understand  it,  the  study  of  how  the  meanings  are 
organized;  what  is  the  hierarchical  structure;  how  much  in  the 
way  of  loops  and  things  of  this  type  occur;  and  how  can  you 
take  and  transform  from  this  list,  150,000  entries,  alphabeti- 
cally organized,  into  some  organization  that  is  computationally 
more  significant.    That,  essentially,  is  the  study.   The 
desired  outcome  is  to  transform  the  dictionary  into  data  that 
is  reasonably  computable;  and  hopefully  to  get  semantic  models 
out  with  considerably  less  difficulty  than  is  currently  the  case. 

Now  I  want  to  talk  about  the  kinds  of  semantic  models.   There 
are  really  three  kinds  of  data  that  we  are  able  to   get,   from 
the  English  lexicon.   One  is  a  model  for  the  meaning  of  a  word» 
And  I  will  show  examples  of  what  I  mean  by  model  shortly.   The 
next  is  a  model  that  for  each  sense  meaning  of  a  word  can  detect 
the  context  patterns   that  will  select  that  sense  rather  than 
some  other.   The  third  thing  of  course  is  we  want  to  organize 
the  whole  set  of  words  that  use  that  word  in  its  definition. 
Now  these  things  will,  I  hope,  become  clearer  in  a  little  bit. 

First,  to  the  dictionary.   I  think  we  know  from  a  linguistic 
point  of  view  that  the  proper  source  of  semantic  information  is 
a  large  corpus  of  ordinary  usage  of  the  language  --  a  very  large 
corpus;  the  Brown-Kucera  corpus  [11]  is  one  example;  perhaps 
larger  ones  are  needed.   In  about  1966,  while  I  was  still  at 
System  Development  Corporation,  with  John  Olney,  John  put  in  a 
proposal  to  NIH  to  keypunch  Webster's  Collegiate  Dictionary. 
He  managed  to  get  the  grant,  and  it  cost  about  $35,000  and  about 
two  years  to  get  the  thing  keypunched  with  great  accuracy  —  one 
error  in  a  thousand  strokes,  or  something  of  this  level  was  the 
quality  control  on  it.   And  he  incidentally  keypunched  Webster's 
Pocket  Dictionary  [12].   Now  the  mass  of  data  here  is  quite 
significant.   The  Webster's  Collegiate  currently  resides  on 
eight  tapes.   Now  several  years  later  and  we  could  probably  get 
that  on  2  or  3  tapes  of  high  density  form.   The  pocket  dictionary 


-3- 


fits  comfortably  on  one  tape,  but  it  pretty  much  fills  it  up. 
Whenever  one  does  some  computation  with  these  data  bases  it  is 
likely  to  be  very  expensive.   So  I  have  really  spent  a  great 
deal  of  time  over  the  past  5  years,  at  Texas  discouraging  people 
from  running  these  tapes  through  the  computer  unless  they  had 
a  really  strong  hypothesis  and  a  well  formulated  plan. 

Now,  why  this  particular  lexicon?   I  think  somewhere  we  are 
aware  of  the  fact,  for  example,  that  Random  House  also  has  a 
keypunched  version  of  a  rather  large  dictionary  [13] .   And  some 
work  has  been  done  on  that  too.   John  Olney  liked  the  Webster's 
dictionary  because  he  went  up  and  visited  the  people  in  Spring- 
field; Gove  was  editor  at  that  time;  and  John  was  very  much 
impressed  by  the  way  that  they  went  about  making  their  dictionary. 
What  they  have  done,  is  to  accumulate  citations  for  perhaps  fifty 
or  one  hundred  years,  and  they  have  room  after  room  of  files 
filled  with  citations  of  usages  of  words.   So  there  is  the  corpus 
on  which  that  dictionary  is  based.   Now  many  not  so  careful 
dictionary  makers  don't  do  that.   You  know  their  offices  are 
typically  —  a  dozen  filing  cabinets  and  a  large  shelf  of  diction- 
aries  and  they  sort  of  rewrite  the  definitions  from  one  dictionary 
on  an  intuitive  basis  or  something  into  another  form.   Well 
Merriam-Webster' s  is  a  much  more  careful  operation.   People 
collect  these  citations  and  the  process  of  making  a  dictionary 
from  the  citations  is  as  follows. 

Let  us  suppose  that  for  the  word  'move'  you  might  have  a 
thousand  citations.   And  so  you  start  putting  them  into  heaps  of 
similar  usages.   You  sort  them  out  into  pile  and  pile,  pile,  pile; 
and  then  you  decide  how  to  describe  each  pile.   OK,  the  resulting 
description  is  a  sense  meaning  for  one  usage  of  the  word  'move'. 
Well,  in  Webster's  Third  International  [14],  in  addition  to 
presenting  that  description,  a  usage  example  is  also  included. 
And  in  the  Collegiate  it  is  very  frequently  the  case  that  with 
the  sense  meaning  there  will  be  an  example. 

So  what  I  am  suggesting  is  that  the  Merriam-Webster ' s  diction- 
aries are  in  fact  based  on  a  large  corpus  of  English  usages  that 


-4- 


are  carefully  collected.   Now  I  suppose  the  weakness  here  is 
that  it  is  all  written  usages,  I  imagine,  I  am  not  sure  -- 
rather  than  spoken  language.   When  a  sense  description  is 
compiled  with  a  typical  usage  example  one  is  not  quite  sure 
how  typical  that  example  is   and  not  quite  sure  how  free  it  is 
from  particular  biases  of  a  particular  lexicographer  who  is 
liable  to  be  an  English  teacher  brought  in  every  few  years  to 
go  through  the  massive  effort  of  producing  a  new  edition  of 
the  dictionary.   I  think  there  is  a  new  one  due  out  in  1976, 
a  Fourth  International.   Well,  I  am  not  too  close  to  that,  but 
linguistic  quality  is  the  reason  that  John  Olney  chose  the 
Merriam-Webster ' s  dictionary  as  a  data  base  that  was  suitable 
for  keypunching. 

Now  we  have  been  poking  around  with  dictionaries  for  quite 
some  time,  and  they  are  really  hard  to  work  with.   I  might  add 
by  the  way   there  are  about  20  copies  of  these  tapes  around 
the  country  in  various  research  groups,  and  not  too  much  has 
been  done  in  the  seven  years  that  that  corpus  has  been  available. 
The  reason   is  because  it  is  very  very  difficult  and  expensive 
to  work  with  it,  and  very  hard  to  know  what  you  are  doing  while 
you  work  with  it.   And  I  am  sorry  to  say  you  may  get  that  idea 
very  clearly  by  the  time  this  lecture  is  finished. 

Now  let  me  show  you  some  of  the  things  that  emerge  from 
Amsler's  work  on  this  dictionary.   One  of  the  first  things 
that  we  have  all  noticed  in  the  dictionary  is  that  there  appear 
to  be  hierarchical  properties.   Words  are  defined  in  terms  of 
other  words.   And  there  is  a  tendency  to  go  up  the  tree  to  an 
increasingly  more  abstract  word.   So  'march'  might  be  defined 
in  terms  of  'walk'  and  'walk'  eventually  gets  up  to  'move'. 
Figure  1  is  an  example  of  that.   'Retort'  is  defined  "to  answer 
sharply".   And  'answer'  is  defined  in  its  turn  "to  write  or 
speak  in  response".   'write'  is  "to  communicate  in  print"   and 
to  speak  is  "to  communicate  by  voice".   Finally,  "to  communicate" 
goes  to  "to  make  known"  so  we  notice  that  the  hierarchy  is  quite 
regular  and  we  can  now  define  "to  retort"  as  "to  make  known  by 
voice,  in  response,  sharply". 


-5- 


Figure  1 

Using  a  Hierarchy  of  Verbs 

RETORT        -   TO  ANSWER  SHARPLY 

ANSWER        -   TO  WRITE  OR  SPEAK  IN  RESPONSE 

WRITE         -   COMMUNICATE  IN  PRINT 

SPEAK         -   COMMUNICATE  BY  VOICE 

COMMUNICATE   -   MAKE  KNOWInI 

TO  RETORT     =*   MAKE  KNOWN,  BY  VOICE,  IN  RESPONSE,  SHARPLY 

=>   MAKE  KNOWN,  IN  PRINT,  IN  RESPONSE,  SHARPLY 


Now  what  has  been  done?   Well,  we  have  taken  advantage  of  the 
lexicographer's  rather  careful  behavior  to  define  a  word  in 
terms  of  a  superclass  word,  a  higher,  more  abstract  term, 
while  carefully  setting  out  differentia  as  modifying  phrases. 
So  "to  retort"  is  "to  answer  sharply".   "to  answer"  is  "to  write 
or  speak  in  response".   So  we  are  able  in  this  hierarchy  simply 
to  lift  up  the  differentia  and  to  define  everything  in  terms  of 
the  top-level  verb.   Well  this  is  rather  interesting.   Many  of  us 
are  aware  of  Schank's  assertion  that  fourteen  primitive  verbs 
will  account  for  all  of  the  verb  usages  in  English  [15].  I  doubt 
the  number  fourteen.   On  the  other  hand,  it  is  fairly  clear  that 
a  single  primitive  accounts  for  hundreds  of  verbs. 

In  these  studies,  we  first  looked  at  some  600  verbs  of 
communication.   In  order  to  do  that  we  inverted  the  dictionary 
so  that  for  each  word  that  occurred  in  a  definition  we  had  an 
entry;  and  associated  with  that  entry  were  the  definitions  that 
it  had  occurred  within.   So  by  inverting  the  dictionary  file 
we  were  able  to  sort  things  and  discover  just  how  words  go  up 
the  tree  --  and  sometimes  around  in  circles.   In  the  6  00  verbs 
of  communication  there  are  3  senses  of  meaning  for  communication 
and  we  haven't  studied  them  in  exhaustive  enough  detail  to  be 
absolutely  sure,  but  it  is   my  impression  that  2  or  3  models  of 


-6- 


'make  known',  i.e.  'communicate',  will  account  for  some  600  verb 
meanings  by  following  this  pattern  of  carrying  up  the  differentia. 

Most  recently,  Amsler  has  studied  200  verbs  of  motion;  and  the 
way  he  got  these  was  a  little  bit  simpler.   The  communication 
verbs  are  a  deep  hierarchy.   On  the  verbs  of  motion,  he  simply 
took  from  the  inverted  file  all  of  the  verbs  that  use  the  word 
'to  move'  as  the  defining  kernel  —  i.e.  as  the  superclass;  and 
those  go  up  to  what  appear  to  be  some  5  or  6  sense  meanings  of 
'move'.   I  will  show  you  these  sense  meanings  shortly. 

The  point  of  the  hierarchy  of  verbs,  particuarly  with  refer- 
ence to  collecting  semantic  information,  is:   how  can  we  sort 
the  material  that  occurs  in  a  good  dictionary  so  that  conceivably 
we  can  make  models  for  20,  50  or  100  units  of  meaning,  and 
classify  them.   There  will  be  a  great  savings  computationally 
if  we  are  able  to  do  this. 

Let's  see  what  I  mean  by  a  model.   Let's  consider  a  sentence, 
"John  wired  a  greeting  to  Mary".   We  are  still  in  the  verbs  of 
commioni cation  here.   "to  wire"  is  "to  telegraph".   "to  telegraph" 
is  to   communicate,   instrument  telegraph,  medium  telegram. 
Now  the  representation  of  meaning  that  I  found  quite  useful  for 
answering  questions  is  the  semantic  net  structures  that  I  have 
talked  about  many  times  but  the  notation  is  simpler  now.   In 
Figure  2   CI  is  a  token  of  'communicate';  the  verb  used  was  'wire'. 
The  tense  is  past,  Actant  or  Agent  -  John,   Theme  -  a  greeting. 
Source  -  John.   Goal  -  Mary.   Instrument  -  telegraph.   Medium  - 
telegram.   That  is  what  we  can  get  out  of  a  shallow  level  analysis 
of  the  sentence  "John  wired  a  greeting  to  Mary".   Now  that  is 
just  the  first  stage.   I  am  not  quite  sure  of  terminology  yet, 
but  I  think  of  that  as  a  shallow  semantic  analysis  or  a  shallow 
semantic  structure.   I  used  to  call  it  a  deep  case  structure 
but  it  is  really  not  very  deep  at  all.   Those  of  you  who  are 
acquainted  with  the  dependency  structure  can  see  that  this  head 
verb  is  simply  in  a  flat  tree  dominating  all  of  these  arguments. 
Now  the  structure  is  really  quite  syntactic  although  it  has 
semantic  aspects.   I  also  assume,  by  the  way,  that  we  have 
selected  the  particular  sense  meanings  of  these  words. 


-7- 


Figure  2 
A  Communication  and  Some  of  its  Meanings 


JOHN  WIRED  A  GREETING  TO  MARY 


To  Wire 

To  Telegraph 


To  Telegraph 

To  Communicate,  Inst:  Telegraph,  Med:  Telegram 


(CI  TOK  COMMUNICATE,  VB  WIRE,  TENSE  PAST, 
A  JOHN,  TH  GREETING,  S  JOHN,  G  MARY, 
INST   TELEGRAPH,  MED   TELEGRAM) 


Communicate:  Assert 


(  (  Know  A  TH   tl  tn  T) 

(  Send  A  MED  t2  t3  T) 

(  Get  G  MED  t3  t4  P) 

(  Know  G  TH   t4  tu  P) 

(  Want  A  (  Know  G  TH)  tl   t3   T)  ) 


Instantiation: 


( (  Know  John  Greeting   tl 

(  Want  John  (  Know   Mary 

(  Send  John  Telegram   t2 

(  Get  Mary  Telegram  t3 

(  Know  Mary  Greeting   t4 


tn   T) 

Greeting)  tl 
t3   T) 
t4   P) 
tu  P)) 


t3   T) 


-8- 


Now  we  want  more.   First  of  all,  associated  with  the  defini- 
tion of  'wire'  are  things  that  will  take  it  up  to  'communicate', 
and  add  arguments  to  the  meaning  of  communicate.   But  now 
associated  with  communicate  also  in  addition  to  the  syntactic 
data  that  is  needed  to  sort  out  its  argumetns  will  be  a  set  of 
assertions.   If  somebody  is  to  communicate  something  it  is  an 
event  that  occurs  in  time.   So  at  the  initial  time,  tl,  the 
agent  or  actant  knows  the  theme.   John  knows  the  greeting  he 
sends,   I  think  'know'  will  be  a  primitive. 

At  t2  he  sends  the  medium  which  in  this  case  is  the  telegram. 
It  might  have  been  a  letter.   It  might  have  been  anything  that 
carries  information  in  regard  to  a  communicate  verb.   At  time  t3 
it  is  possible  the  goal  person  will  receive  that  telegram;  at 
time  t4  it  is  possible  that  the  goal  person  will  know  the  message. 
So  the  final  values  of  each  assertion  are:   true,  true,  true, 
possible,  possible.   It  is  also  the  case  that  John,  the  agent, 
wanted  the  goal  person  to  know  the  message  and  he  wanted  this  at 
least  from  tl  to  t3  when  he  sent  it.   Now  that  is  a  fair  but  not 
absolutely  necessary  type  of  inference.   So  what  we  are  doing  is 
to  associate  with  'communicate'  a  set  of  assertions  defined  over 
the  argument  variables.   T^d  when  these  are  applied  to  the  parti- 
cular usage  we  discover  that  John  knows  the  greeting  at  time  tl 
to  tn,  which  means  that  he  knew  the  greeting  all  along,  and  John 
wanted  Mary  to  know  the  greeting  from  tl  at  least  until  he  sent 
it,  and  John  sent  the  telegram  from  the  time  interval  t2  -  t3 ; 
it  is  possible  that  Mary  got  the  telegram  during  time  t3  -  t4; 
and  it  is  possible  that  Mary  knows  the  greeting  at  some  time 
t4  to  tu .    It  is  also  important  to  point  out  that  tl  is  before 
t2 ,  t2  is  before  t3,  etc.,  and  tu  is  before  now,  are  all  asser- 
tions that  hold.   So  what  is  going  on  here?   This  is  really  what 
we  mean  by  a  model.   We  mean  that  the  assertions  that  are 
explicitly  made  in  the  sentence  are  one  part  of  the  model  of  the 
meaning  of  that  sentence  and  the  assertions  that  are  implied  in 
the  sentence  are  another  part  of  the  model.   So  the  lexical  struc- 
ture that  we  seek  would  make  those  explicitly   available.   Now  we 
know  from  the  work  of  Charniak  and  Schank's  students,  Riesbeck  [16], 


-9- 


Rieger  [17],  and  Goldman  [18]  that  these  things  are  definitely 
going  to  be  useful  in  discourse  analysis,  although  I  confess  I 
don't  know  how  to  do  discourse  analysis  with  this  type  of  asser- 
tion yet.   That  is  another  line  of  research  we  are  very  much 
concerned  with. 

When  we  analyze  a  mass  of  verbs  one  of  the  things  that  we  are 
attempting  to  do  is  to  discover  the  case  arguments  that  go  along 
with  each.  For  the  verbs  of  communication  (see  Figure  3),  there 
are:   agent  —  which  is  an  animate  organism;  theme  —  contents 
of  the  communication;  source  --  the  person  or  system  that  sends 
it;  goal  --  the  receiver;  manner  —  for  example,  sharply,  harshly; 
instrument  --  telephone,  telegraph,  voice;  depth  and  length  — 
long,  short;   frequency  --  repeatedly;   intensity  --  loudly; 
and  medium  —  the  form  of  the  communication. 

Figure  3 
Arguments  of  Communication  Verbs 


AGENT 

THEME 

SOURCE 

GOAL 

MANNER 

INSTRUMENT 

DEPTH /LENGTH 

FREQUENCY 

INTENSITY 

MEDIUM 


Person  or  Animate 

Contents  of  Communication 

Place  or  Person 

Person 

e.g.  Sharply,  Harshly 

e.g.  Telephone 

e.g.  Long  ...  Descant 

e.g.  Repeatedly  ...  Nag 

e.g.  Loudly  ...  Bellow 

e.g.  Form  of  Communication 


-  Letter,  Telegram 


-10- 


I  think  that  is  enough  for  communication  verbs,  as  a  notion 
to  get  started  with.   Figure  4  is  an  example  of  the  data  that 
has  been  put  together  --  This  is  one  page  from  the  200  verbs 
of  motion,  and  what  has  gone  on  here,  is  shown  for  example  in 
'paddle  2.1';  moves;  theme  --  the  hands  and  feet;   path  --  about; 
medium  --  presumably  water  or  something  of  this  sort;   So  what  is 
done  is  to  take  the  verbs  and  using  an  ordinary  editor  program 
go  through  with  as  little  change  in  the  definition  as  possible, 
and  mark  the  category  of  each  distinguishing  argument  for  the 
usage.   In  fact,  it  takes  a  lot  of  sorting  in  heaps  before  one 
can  get  categories  organized  and  then  one  begins  to  edit  the 
definitions  to  get  them  into  uniform  structure. 

We  are  developing  methodology  --  so  we  are  working  with  the 
pocket  dictionary;  and  it  is  quite  clear  that  the  pocket 
dictionary  has  abbreviated  many,  many  meanings  to  the  point  where 
they  are  not  useful.   I  think  "scud  —  to  move  speedily"  is  a 
good  case  in  point.   'March'  is  defined  without  even  using  the 
notion  that  it  is  on  foot.   Abbreviated  definitions  are  fine 
for  some  purposes,  but  the  Collegiate  is  much  better  for  model 
making  and  of  course  the   Third  International  is,  as  far  as  we 
can  detect   without  looking  at  it  exhaustively,  quite  good. 

We  also  use  the  dictionary  to  develop  context  patterns  that 
will  identify  the  sense  meanings  for  each  word.   Figure  5,  for 
example  is  an  analysis  of  'move'  from  the  Third  International 
which  includes  sentence  examples. 

Has  a  physical  object  changed  location? 

Yes,  Has  a  physical  object  changed  ownership  or  status? 
Yes,  e.g.  The  Christmas  items  were  moving  rapidly. 
We  just  moved  to  town. 
Has  the  physical  object  changed  owernship  or  status.  No. 
Move  1  —  e.g.  The  cars  moved  down  the  road. 

That  one  was  puzzling  for  a  while. 


-11- 


a 
z 

2 


u 

z 


in 

o 


3 

a. 
a. 
a: 
r<. 

o 

z- 
o 
I 


« 
r 
in 


o 

Ul 
•~  X 
V)  u 
)£        « 

Z        —I 


in  ■ 

z 
o  ■ 


5:' 


A    O 

« 

O  — 
a  »- 

a 

>  3 
K  X 
Ul  CD 

a  < 

X 

-I  < 

X  •- 
<nui  •- 
3  i  « 

I  I-  z 

-O  J 
O  Z  UJ 
UJO  L> 
U-l  u 

a.  <  < 

w>       — 

—  2  •- 

o«  a 
^  z  u 

D  U  I 
—  O  2  /> 

u  in>' 

in  otn  uj 

«   J-  IT   u 
I   •-  <  2 

T   U  « 
I   «  V 


U 

in      2 

►-      •-• 
o       -I 
« 
t   :  * 

*o 

V  o  z 

VIZ  Iki 

in  z 
UJ  — o 
Z  V  Z 

■^  1£  — .— 

;i  X  m  — 

4  UJ  2     A 

UJ  ~>  -^    ♦ 

•-        ^  r\i 
i«  —         V 

—  n^e 

—  /I  3  - 

ujuj       in 

J  Z  IT   Z 

—  —  Z    J 

>-  o  -• 


•4  X  —  -• 

B         >-  'J 
«  _»   — 

—  B 

o  -o 

ul  CJ  -I  — 

w  z  in  in 
a  J  •- 
<n  CJ  ~  «j 


_l  UJ    I? 

—  in  z 
UJ  *- 

o  —  - 
2  Z  >- 
<  UJ  L> 

.-  O  a 
mo  I 

•-  o  o 

o  IT  r: 

w  1/1 

?  z 

Z  X 

—  J  -. 
O  J  J. 

lAi  Uf  m 

UJ  *j 

a  o  4 
*/>  « 


—  UJ    I    ^^  — 


Ul  z 
■J  L  i.  O 
*   >-  t/t  1^ 

a  » 

o      *i  » 

U)  >  •-* 
ul   >   O 

>  C  I  u. 
O  1  > 
Z  i  o 
I  z 
(  o 
-•-•I 


UJ  ^  IM  UJ  -^ 


>  —  ►- 

0  I-  o 
Z  T   Z 

1  O  ^' 

z       u.  — 
o      u 
^  —  UJ  r 
~  >      o 

_  J  _ 

c  in  fl  -* 

xl  D  UJ  >" 

z  o  -r  J 
•-  1  c  •- 
mo      > 

Ui  c  —  4 
o  •-•  UJt^ 
*  ij  z 
Z  X 
^  "  u  — 
•->  Ul  U.UJ 
—  L»  V  > 

J  Z  «J-" 
UJ  o  >  »- 
Z  U-  »-« o 


I  •-•  ujuj  UJ 


oc  >-  z  le 

IZ    X  U  T 
3  O  Z   UJ 

r  r  «  T 


UJ  X  •-  *-  c  . 
r  «  £  -  3      I  -> 

•-  M  Z    i   O  •* 

BI-*-M'»>-wI 

-•  —  %  ^ 

UJ    •  O  UJ  U'  "^ 

>  UJ  Z    >    UJ> 

OU>4C>CU 
I    >  C  I    O  »    > 

O  Z     •        Z        o 
I    Z  •   •  I    z 

I     •       i 
-<  I  -•      —  I 

-•  -•     •  'U 

-.        z   — 

uJ  uJ         X  UJ 
0»-  O  O   Z  O  D 

o  J  o  o  m  tj  I 

-*  O  UJ  Al  4    4  3 
->  ")  It   <C   J  J  J 


J) 

«. 
u 

o 
J 

X 

a: 
Jl 


>■ 
or 


=> 
o 
w 

K 
U 

X 

o 
z 


Z_l 

0  « 
-o 

-.3 

«  — 

z 

(ZO 
Ol-  — 

u.      tn  A 

—  tn  uj 

>■  >-  «  IM 

1  -z  *i 

—  ■  o  X 
^  4  ^  a 
J  t>  -• 

-t  O  >-  Ui 
1   J   !3   X 

3  —I 

z  X  jj. 

-.3  z  - 

2  — UJ 

*.    •  <  /) 

•-•  UJ  ^3 

o  r  u)  z 

■/I  4  — •  o 
«  J-        3  J 

a  -  3.Z 

V  -      r  z>- 

*   UJ     •  -> 

4  >  >-  •-  m  — 

lJ  -•  Z     3  Z  — 

z  »-  ►-  u;  ^  r 

/>  3  Z   Z   -I  uJ 

V  Z  3   -   D  V. 

UJ  c       I-  r 

-  —  i_i  or       3 

o  u  t- 

-  z  UJ  r  in 
>-  o  <:  ^  »-  ^ 

J  <3  uJ  V  -• 
>U  ^         _J        -^ 

>  u  _  u  — 

-■      »j  z  J  r 

-  —  ■  3  -•  «  — 
Z  13  X  O  <J  •-» 
3  Z  3  O  « 
U.  13  CJ  4 

-JO)  <    U. 

-  4    —  Z         > 
^  CI.  (3 

—  I       r  I 

t    3  - 

—  I  X  —  r 
4  u.  —   r  3 

a       <  *- 
f—  »j  z  «  c 
>  "  a  u> 

UJ  O         -  3 

>  >  u      a 

C  >    UJ 

X    •    3  >  O 

I  —       z 
•        I 


r< 
ij< 

z 
z 
< 
z 


X 

o 


ar 

UJ 


z 
ul 

9 
III 

r- 


in 

Ul 

z 

3 
«■ 
UJ 


UJ 

_i  m 

3  — 
O 

<  AS 

a.      *-  o 

4  in  • 
3  — 

-        O  I 
»-»-  >   4 

ac  z      X  -> 

UJ  ul       UJ  3 

Z  IS 

UJ 

.  -o  X 

.  Z  X 


A 

T 

*~ 

cc 

■X 

o 

< 

a. 

z 

-1 

< 

►- 

u 

4 

O 

z 

_» 

< 

u. 

Ct 

..  • 

Jl 

A 

«' 

Ul 

« 

s 

v4 

Ul 

V 

^ 

a 

t_l 

z 

mm 

-1 

n 

w— 

I 

•> 

« 

ul 

I 

JI 

> 

Ul 

c 

o 

z 

z 

£ 

z 

a 

1 

4 

^ 

t 

u 

6 

u 

z 

a 

"■ 

u; 

z   4         (\J 

:  u  z  -I       uj 

:  z  3  _i  o  m 

I  4  — •  «  3  3 

I  z  z   z  z  z 


»-  i 

<D 
»  z 

^- 

s  in  - 

0  z  — 

A  _i  •-■  V 

x_i  - 

»-  4  — 

X  I  i  in 
^  in  z  X 

O  -•  4J 

—  •/!  > 

X    Z  3  O 

UJ  3  T 

Z  •-•  >  I 

3    O  Z 

O  UJ  o 

I-  r  -  z 

Z  -  X   - 

—  Jj  ~ 
«  — ■  z 

O  ►-  4  UJ 
Ul       s    > 

in  —  o 
o  •-  — 

Z  3T    - 

J  :3  i. 

a.  D  —  •- 

*^  4   O  4 

r      XI  X 

m  —  i  — 

1  — 

UJ T 

r  4  c  3 
►-  a  ;3 

V  -  D  . 

^  CJ    — 

—  >-  2  c 

>-  aJ   C     Z 

3  UJ»-  G 
K  U.         >- 

3  -  jj 
_l  <    I    I 

in      »- 
1/    <■  — 

—  3  X   r 

O  *£  —  »- 

uJ  4  4 

oj  X  Z  X 

X  o  — 

yi  vi  • 

—  c 

—  ►-  ^  — 

0  o  in 

4  *•  4 
UJ  UJ  —  X 
J     2    X 

4  uJ  *-    — 
I     4  T 

—  -  X  - 
Z 4 

—  ■  X 
4  U)  UJ  — 
X    >    » 

—  '3  CI   UJ 

z  z  > 

U  c 

>    I    I  z 

3 

z  -•  -•  • 

•    • 

1  f\j  f\i  r^ 

O  uJ  UJ  ^« 
3   3 

UJ  o  o  in 
-yi  o  o  in 

3   4    4  4 

^  X  a  x 


<  >- 

u 

z  z 


0,  1/1 

V  il  — 

z  >- 

u;  o^« 

_J  4  > 

u.  u;  4 

-^  —  uJ 

z  in  E 

—  v 

V  r  - 

>-  UJ 

—  <  > 

a  — 

—  —  I- 

t-  »  3 
3—1 
O   )£  UJ  ^ 

[D  Vi  V 

4    •-  UJ    I 
CI    13 


Ul  >.  X 

I-  I  >- 

-S  *l  4 

Z  U-O. 

X  in  — 

UJ  £ 

*-    4  X    >* 

-I  X   3   Ul 


-    k     3    A 


z 

Z        .k 


in 

in 

O       V  uJ 

•-  Ul  »  Z 

_l  4  ^ 

--   Z  O 

aJ/-    3  4 

>  4  r  iaj 

-tX  •-•- 

—  V  *^  y) 

O        J  — 


O  X  —  u 

4  1-  Z 

_J   4  A    4 

a  d.  o 

—  m  — 

Z  -14 

>aJO  4 

z  ir  V  Ul 

•-  >-  3 


—  —  Z 


_l»-    D 
4  J  Z 

3    4    O 


3    4 

o  m 

:  I-  —  z 

-  a 
JO  -  ■ 


X    J 

UJ    4 
I-  X 

4    •-• 

«   X 

m 


—  —  u.  r  yi  -« 

43—  Xl/)X—     < 

a  uj      v  U-'  UJ  I 

—  UJ  —        _J  I-  »-  o 

—  xe  ->>-4Z 

>-  in  z       »-•  4  a  4 


4   _1   —    « 


Z  U.    UJ 
13  •-•   ^ 


o  < 
J  X 

,  z  — 

x 

o 


z 

3 
X 

in 

z 

\  — 

3 

Z   £ 
UJ>- 


in  4  4   X 


J    z  - 
r  -•  r 


4  UJ 
«     > 

r  -•  ■ 


«  4    w   M     V 


>    UJUJ 

•-  a  > 

-33 


O  IC    — 

2  O  Z 

3  4  — 

C  X    4 

tfj 


3—3 

r  3  - 

3   4 

C     I    UJ 

1/1  u  tn 

—  33 

t/>    4 


»-»-———    Z 


UJ  £    • 

:>  ^  . 
O  I 


-ox 

3 
UJ   UJ 
>    >    UJ 

c  c  > 
z   z  o 

z 
I    • 

I 

P>i  o 

•    *  -t 

AIM 

>-  >-  Z 
4   4    3 

3  J  3 
XXX 


I    Z 
-•     I 

in  -^ 
in 

X  UJ 

o>- 

X    3 

a  X 


I 
I   c  -  iJ 

«-    4    > 

:  -^  X  o 
)  »  —  1 

J  UJ  UJ    I 

•  >  > 
c   c  — 

T    I 

UJ 

•    1  >- 


c 

z  »-  z 

u.  XI  O  ■ 


>>><<■ 


»  z 
>■  o 
J  -• 

«►-. 

U  4 
X  Z' 

u.  « 

X 

-•kii 

UIX 

ua. 


»-  3 
A  IT  C 

•i  —  I 

z  in»- 
•-•  Ul  •-• 

Z  X  X 

z  - 

3         X 

z  in  o 

in 

UJ  4  in 

A  X  X  in 

>>  4        *i 

a>      -  z 

3        X 
O  4    —   ul 

uj  X  in  t3 
_i  in  _j  4 

J  V  UJ  Ul 
O  Ul 

z  -  r  z 

3*3 

n  z 

Z  3    Z   Ul 

4  O  O  I- 

uj  X      in 

>-    .5  U.  4 

V        -.  z 

o 

-  Z  /)  - 
Z  -  4  4 
3    Z  UJ 

^  »   — oc 

t-  4   UJ  (3 

3  X   O 

—  J  in  ^  o  V 

—  O          4  C   Z 
Z  »    <■ >1 

ul      'J      in       3 

>  X  -  —  r  a 

3  J  y»  —   X 

4  4    'J  *-' 

<  c  z  »   - 

O   '^   X 
Z  Z   —   J— ■  — 

Ait       *-«         CI  Ul  z 

>  I-  O  X  >  o 
C         UJ»-   O    •-  3 

-J       u  »-  c 

—  X.  •  -  3  r 
r       IX       J   X 

•-        3  y>  Z   uJ  — 

4  O  ul  O   — 

a  — 

—  iMy. t-    1 

_J  T  I/",  w,   u  »- 

IS  UJ         _l  3  4    4 

^  UJ  u.  CJ  u)       a. 

«  X  —  C?  UJ  X    — 
Z    «  X  X  3    — 

z      in  u  .<       >- 

3  —  «  n 1 

-  3   3 

Z  O  —  ►-  t  I" 

—  ujx  >-  z  4  a. 

►-    X     4  ►-    ul    «     4 

0  IT   »    ^  X  1     X 

4  tr  -'  •-  3  c^ 

1  »-  o  z  a  u.  — 

(11/  4  ♦-  C 

3  z  -  3  n  -   J 
m  -•  X  13  z  X  ul 


X 
Ul 

-<: 

3 
O 


>• 
-I 


3  »- 

■S  — 
UJ  < 

a  IS 

X 

•-  19 

z 

>•  o 
u  o 

Z  4 

Ul  X 

3  O  — 
C         >- 


X  UJ 
U.  13 
X    £ 

in  4 
m  •- 
Ul  in 
z  •-• 

-<  in 

3  >J' 

4  Z 


3 
Z 


O 


a 


z 

■X 

ul 


t-  a  I 
in  Ul  : 


o  — 

4  ul  13 

u.  m  z 

at  •— — • 

3  ox 

m  z  uj 

«  3  »- 

X     4 

IS  4    Z 

z  z  in 

o  m 

_j  — 

<  <  >- 

—  -3 

—   >-  UJ    4  Z 

Z    _<  K    I     aJ 


0  — 

1  — 


TV  — 

<3  O 

u.  u.      tn 


4  z        t> 
X  o      z 

—  u.        z 


X  o 

13  u. 


•-•  !/•    4 


x 


X  z 

UJ  ►-• 


>-  3 
_J  "- 
3  I/! 


V) 
U.' 

if. 

4 
»- 

z 

aJ 

uJ 

z 

u 

a 

UJ 

3 

z 

trt 

•-• 

4 

4 

— 

z 

UJI- 

«^ 

o 

f- 

in 

I 

■*- 

X  —  e> 

«   >-    3 

Z    J  tk. 

3   Z 

J  — ,    . 

X  o  u 

o  ■> 

•  z  — 

/I  ■- 

X       o 

I  IC^  -  c 

I     4    UJ  aJ 

:  UJ  CJ  — 
z  z 


O   !/■      4    •»     X    J 


C    UJ  — 

tn  cjx 

T    3 
'*•  O  U.I 
3  U. 
Z  - 

3  —  UJ  — 
C  U-  >  tr 
■n   3—  Z   . 


iJ        Z 


-•z 

Z     U! 

>-  I 


uJ  3   3   .1 
u.  Z    4 


X    Z    I 


in  >-  I 

■■    —  u     3 

>•  z  —  : 


'  CJ  UJ         Z    4 


>-  o 
J  t 


O  o  X 
)  UJ  ^  (/) 


* 

s    ■ 
<    Ul 


—•  >    4 
-    3    U 

I  r  z  - 


—  Z)  ^  4        •-  — 


—  »    Z    1 

tn  •-•  — 

x  —       — 

-i  3  »J  >- 

J  Z    Z    J 

4  4   U 

UJ  3  —    •- 


a  -• 


-•  IM  "M 


3    IaJ  • 

3  > 
U   C    I 


o  o 
z  z 


fV-*  13  -«  -* 


4  —  4   t/j 

>-  »  zr  X- 

TJ  O  —  —  Z  — 

-3         uJ 

U<  u  U'  u  u    > 
>   >  >   >   UJ  >    c 

O  3  O   3  >  3  Z 

Z    Z   Z    Z   3  Z 

T  ) 

i      I     I     I  I 


;  -  4  I 

1   3    3  — ■ 

>*J         '- 

uJ  X    « 

;  a  o 
I  A-      a 


—   ■ 

I   IM 

*  I 

rtjiMp--«-*rMfniA      -• 


>  I 

3 


4  Z 
X  p- 
—    4 

a  I 

ii —  . 

> 

O   IaJ 

I  > 

■    Z 

P  -•    I 

I  -O    — 


tn  •-  —  I-  U' 


J—  o  o  O  I 


4   aJ  UJ    aJuJ    aJ  •-*  — • 

xxxzrxxx 


Ul  )£ 

in  CJ 
—  O 

X   T 


-I  3  _l  Z    3  . 

J  J  3  z  ?:  in  o  : 

O  O  3  3  3  3    cl  I 

X  X  T  z  x  z  in  I 


lO  v  tk 

3  4  3 

131    t 

(  in  in  in 


in  in 


-*  3 

in  in 


t    Z  3 

13  Z   uJ  •-  »-  uJ 

UJ  c    -t  O  aJ 

T  —  •-  a  4  X 

3         —    -  /» 

UJ  •  — 

Uj   >    lAl    UJ  uJ 

>    C    >    >    -     li. 

O   J    C   C-  3  > 

Z  Z    Z    Z    3 

i  Z 

•    -•II 

• 

-•-•-•  -VJ    -• 

UJ    1£     ^     X     iC 

CJ  z  13  in  4  z 
— •  .^  4  4  UJ  -• 
3   3   X    Z    Z    X 

to  i/^  m  1/1  in  m 


t    3 


>  O 
3  Z 

Z 


UJ 

—  _l 

X  Uj 
-J 
4  O 
X    4 


UJ  Ul 

>  > 
c  c 


3  -i  Z 
■J  :^  -I 
UJ  .T 
—  a  — 
o  — 1/> 

U,        Z 


lAi  o 

>    Z    Ul 

c        > 
J     I  c 

Z 

I  nj 


C  -1 

J'9  .S  13 

4    Z  Z    3  X 

X  — •  "-  UJ   4 

—  Z  Z    3  UJ 

X  a  X  3  — 

i/>  in  in  in  in 


-12- 


Has  a  part  of  the  physical  object  changed  location.   Yes. 
Move  2  --  e.g.   The  trees  moved  gently  in  the  breeze. 

He  moved  restlessly  in  his  sleep. 

He  pressed  the  button  and  the  machine  began  moving. 

And  so  on. 

Here  is  a  continuation  of  that  chart. 

Has  the  rate  at  which  something  was  happening  changed?  Yes. 

e.g.  The  plot  moved  quickly.   The  melody  moves  upward. 
Has  some  action  which  is  a  part  of  the  plan  or  procedure 
been  proposed  or  performed?  So  we  get  into  Move  6. 

Move  for  a  recess. 

Revolutionaries  must  make  their  moves  carefully. 

Moves  in  social  circles. 
Has  someone's  emotional  state  changed? 

Move  to  tears. 

So  there  are  six  senses  of  'move'  that  Amsler  distinguished 
by  studying  the  Third  International.  Now  what  does  that  mean 
computationally?   In  Figure  5a  we  have  move  1,  2,  3,  4,  5,  6. 
We  have  the  basic,  most  important,  arguments  that  usually  occur. 
And  here  the  agent  must  be  human.   Move  6,  is  to  make  a  motion 
or  resolution.   So  we  must  check  the  theme  to  make  sure  that  it 
has  some  kind  of  a  marker  equivalent  to  'statement'  or  'resolve', 
a  'resolution'  or  something  of  this  sort;  and  of  course  we  will 
check  the  agent  to  determine  that  it  is  human.   Emotational  state, 
move  5;  e.g.  "the  novel  moved  me  to  tears";  the  criterion  is 
that  the  Theme  be  a  person  and  the  Goal  an  emotional  state.  In 
terms  of  our  experience  with  microworld  modeling,  it  is  usually 
sufficient  to  mark  the  nouns  with  particular  semantic  features 
in  this  kind  of  an  area.   But  will  that  approach  work  all  across 
the  language?   Semantic  features  are  probably  not  sufficient,  so 
after  defining  the  criteria  for  selecting  the  particular  senses 
of  meaning:   How  does  one  mark  the  appropriate  information  on 
the  nouns  and  distinguish  the  types  of  sentences  that  can  be 
arguments  for  that  sense?   —  Well,  we  have  learned  a  lot  in 
terms  of  microworld  models.   And  I  think  this  will  solve  eventually. 

-13- 


Figure  5a 


MOVE 


I 


has  a  physical  object 

changed  location? 

(macro) 


yes 


has  a  physical  object 
changed  ownership  or 
status? 


yes 


-il 


MOVE  3 

The  Xmas  items 
were  moving 
rapidly 


we  just  moved 
to  town 


-± 


MOVED  FROM 
PLACE  TO  PLACE 


no 


has  a  part  of  a 
physical  object 
changed  location? 
(micro) 


no 


yes 


MOVED  IN 
POSITION 


no 


iL 


MOVE  1 

The  cars  moved 
down  the  road 

The  chess  master 
moved  (his   chess 
piece 


V 


Have  your  bowels 
moved  today 


J 


MOVE  2 

The  trees  moved 
gently  in  the  breeze 

he  moved  restlessly 
in  his  sleep 

she  pressed  a  button 
and  the  machine 
began  moving 

The  boat  moved  slowly 
from  side  to  side  at 
'<the  dock  J 


-14- 


Figure  5b 


MOVE 


4,5,6 


has  the  rate  at  which 
something  was  happening 
changed? 


yes 


v. 


MOVE  4 

The  plot,  melody 
moves  quickly 


for  a  while  there 
was  nothing  to  do, 
but  suddenly  things 
really  began  to 
move 


MOVED 

IN 
SPEED 


no 


^ 


is  some  action  which 
is  part  of  a  plan  or 
procedure  been  pro- 
posed or  performed? 


yes 


L 


V. 


MOVE  6       > 
moved  for  a  recess 

revolutionaries  must 
make  their  moves 
carefully 

moves  in  different 
social  circles 


J 


no 


MOVED 
IN 
THOUGHT 


Did   someone's 
emotional   state 
change? 


-15- 


Figure    5C 
MOVE   Context   Patterns 


Sense 

Agent 

Theme 

From 
Source 

To 
Goal 

Loc 

E.g. 

MOVEl 

* 

Phys-Obj 

Loci 

Loc2      Medium 

Travel, 
March 

MOVE  2 

* 

Phys-Ob j 
Part 

Loci 

Agitate , 
Fidget 

MOVE  3 

<Animate> 

"Center   of 
Activity" 

Loci 

Loc2 

Migratel 

MOVE  4 

"Plot, 
Melody" 

State  1 

State 2 

MOVES 

Emotional 
State 

Statel 

State2 

Touch, 
Persuade , 
Stir2 

MOVE  6 

<Human> 

"Statement, 
Resolve" 

Statel 

"Make    a 
Motion" 

-16- 


In  studying  the  verbs  of  motion  Figure  6  is  our  current  set 
of  argument  classifications:   agent,  theme,  source,  goal,  instru- 
ment, path,  medium,  speed,  acceleration,  steadiness,  continuity, 
force,  resistance,  and  orientation.   Now  Amsler  is  making  quite 
a  detailed  study  of  these  arguments  because  he  is  still  exploring 
the  possibility  that  he  will  be  able  to  take  the  definitions 
and  push  them  onto  a  display  scope  and  draw  particular  kinds  of 
squiggles  to  represent  in  real  time  the  meaning  of  the  motion 
verb.   Whether  he  will  be  able  to  do  that  or  not  remains  to  be 
seen.   It  is  clear  and  easy  for  some  things  and  not  for  others. 
Notice  manner,  of  course,  is  liable  to  be  an  attitude,  and  'thud' 
is  "to  move  with  a  heavy  sound".   If  one  wants  to  make  pictures 
of  meanings,  one  needs  more  than  a  tempero-spatial  frame  of  refer- 
ence.  In  the  200  or  so  move  '.'erbs,  there  are  at  least  a  dozen 
that  have  sound  and  probably  a  couple  dozen  that  have  connota- 
tive  things  going  on.   In  'lurk',  for  example,  'slyly'  is  one 
of  the  connotations  that  goes  along  with  it.   So  it  is  rather 
hard  to  draw  pictures  of  all  of  the  move  verbs. 

Figure  7  is  an  example  of  how  we  expect  to  use  the  move  verb. 
"Arnold  marched  the  army  slowly  through  the  countryside  from 
New  York  to  Montreal" .   First  we  give  the  shallow  semantics  of 
the  sentence.   Cl  is  a  move.    CI  represents  this  whole  proposi- 
tion, this  whole  idea.   There  is  a  moving  going  on.  The  verb  that 
was  used  is  "to  march".   Now  I  saved  the  verb  because  I  just  don't 
want  to  fuss  around  like  Goldman  does  trying  to  find  my  way  back 
to  the  surface  representation.   Some  of  you  have  read  Goldman's 
excellent  thesis  [18]  in  terms  of  how  you  go  from  Schank's  very 
deep  conceptual  depencency  structure  back  into  making  sentences 
but  in  some  cases  one  can  save  a  lot  of  trouble  by  carrying  the 
verb  along,  and  not  get  into  that.   The  tense  —  past;  the  agent 
—  Arnold;   the  theme  --  02  is  a  token  of  an  army;  now  in  fact, 
one  needs  tokens  for  all  words  but  I  am^  not  showing  the  complete 
computational  representation;  I  am  trying  to  communicate  it. 
I  had  to  put  a  token  for  'army'  because  there  is  a  difference 


-17- 


Figure  6 
Arguments  of  Motion  Verbs 


AGENT 

THEME 

SOURCE 

GOAL 

INSTRUMENT 

PATH 

MEDIUM 

SPEED 

ACCELERATION 

STEADINESS 

CONTINUITY 

FORCE 

RESISTANCE 

ORIENTATION 


(object  moved) 

from 

to 
(used  for  moving) 
(course  or  path-of-motion) 

(e.g.  fast,  slow) 

(e.g.  sudden) 

(e.g.  steady,  with  jerks) 

(e.g.  frequently) 

(e.g.  forcefully,  slightly) 

(e.g.  with  friction,  as  if  on  wheels) 

(side  foremost  -  sidle) 


-18- 


between   what    armies    do   in    general    and  what   this    army   did.      So: 
'move',    by    marching,    in    the    past,    done    by   Arnold,    from  New   York, 
to   Montreal,    through    the    countryside,    in    military    formation, 
speed   —    slowly.      Now    using    the    type    of   modeling      that  we    discus- 
sed  in    the   verbs    of   communication,    the    assertions    associated  with 
'move'    translate    into    this    time-ordered   series    of    assertions:    -- 
that    it   is    probably    true    that  Arnold  was    at   New   York    tl    to 
t-delta.       It   is    true    that    the    army  was    at   New   York   tl    to   t-delta. 
It    is    probably    true    that  Arnold  was    in    the    countryside    from 
t-delta   to   tn.       It    is    true    that    the    army  was    there.       It   is 
probably    true    that   Arnold  was    in   Montreal   from   tn    to   tu   and   it 
is    true    that    the    army  was    in   Montreal    from   tn    to   tu.      And    tl    is 
before   t-delta;    t-delta   is    before    tn ;    tn    is   before    tu   and   tu   is 
before   now;    are    all    true.       So  we   have   got   time    going   on.      Now 
'slowly'    is    translated   into    "greater   the   move    rate   of    armies    than 
the    move    rate    of   C2";    C2      was    the    token    of    the    army;    it   was    a 
particular    instantiation   of    the    concept    'army'    that   occurred   in 
this    context.         And    "in   military    formation"    during   the  whole 
period   is    a   reasonable    inference    to   make.      Now   I    don't   want    to 
make    this    sound   very    cut-and-dried   because    there    are    a    lot    of 
other   inferences    that   might  be   made.      Notice,    nobody    said   the 
army  was    on    foot.      And   yet  marched   implied    that   it  was.      And 
nobody    said   that    the    army  was    tired  when    they    got   to  Montreal. 
I    don't  know  whether   they  were   or  weren't.      But   probably    it    is 
the    case    that  you  would  want   to      make    that   inference    to    understand 
"Why    did  they   sleep    for   2  4   hours    thereafter?"      Well   you  need    to 
make    the   possible    inference    that    if   the    distance    is    from  New   York 
to   Montreal   and   if    one  walks    or   if    on    foot,    well    then   one   will 
probably   be    tired.      And   if    tired,    one   will   want    to   sleep    for   a 
long   time. 


-19- 


Figure    7 
A  Move   Model 


Arnold  Marched  The   Army   Slowly   Through   The   Coiantryside    From 
New  York   to   Montreal. 


(CI      TOK      Move,    VB   March,    Tense   Past,    A  Arnold, 

TH    (C2      TOK      Army),    MED      Countryside,    S   New-York, 

G      Montreal,    MAN       (in   Military    Formation) ,    SPEED      Slowly) 


( (ATP      Arnold      New-York      tl      t-delta) 
(AT        Army  New-York      tl      t-delta) 

(ATP      Arnold      Countryside      t-delta      tn) 
(AT        Array  Countryside      t-delta      tn) 

(ATP      Arnold     Montreal        tn      tu) 
(AT        Army  Montreal        tn      tu) 

(GR         (Moverate   Army) (Moverate   C2)) 
(IN        Army      Military-Formation    tl    tn) 
(Before      tl      t-delta) 
(Before      t-delta   tn) 
(Before      tn      tu) 
(Before      tu  now)     ) 


-20- 


I  feel  pretty  good  about  how  we  are  able  to  do  things  with 
verbs  and  I  feel  very  hopeful  that  we  can  develop  a  methodology 
for  sorting  the  dictionary  to  get  this  kind  of  information  in 
fairly  large  quantities.   With  a  generous  sponsor  some  day  we 
might  just  go  at  it  and  see  how  many  primitives  we  in  fact  need 
for  a  particular  purpose.   The  purpose  is  obviously  important  in 
terms  of  how  you  classify  things. 

I  don't  know  very  much  about  nouns.   We  have  struggled  with 
nouns.   They  also  occur  in  hierarchies  in  the  lexicon.   One 
interesting  fact  is  that  the  top  of  any  noun  hierarchy  appears 
to  be  an  argument  position  in  a  verb.   Once  again,  I  haven't 
seen  a  hundred  nouns  so  I  am  not  quite  sure  that  this  is  always 
the  case.   But  in  the  few  nouns  that  we  have  looked  at  and  taken 
up  the  hierarchy,  we  eventually  reach  an  argument  position. 
For  example,  'message'  is  eventually  defined  at  the  top  as  'that 
which  is  communicated'.   So  it  is  an  argument;  it  is  defined  as 
the  theme  of  a  basic  verb.   Now  a  book  (see  Figure  8)  is  all 
kinds  of  things.   It  is  print  on  pages  of  a  given  size.    A  hand- 
book is  a  concise  book  for  memoranda  or  notes.   A  hardback  is  a 
book  bound  with  hard  covers.   A  paperback  is  paper-covered.  A 
primer  is  a  book  that  is  used  to  teach  children  to  read  and  so  on. 
So  summarized  in  a  network  are  all  the  things  the  dictionary 
describes  about  'book'.   This  doesn't  focus  on  the  hierarchical 
structure;  this  just  shows  the  main  senses  in  which  the  word  book 
can  be  used. 

To  put  it  all  together,  as  far  as  we  know,  we  are  going  to  end 
up  handling  nouns  the  same  way  we  handled  verbs.   That  is,  a  noun 
is  going  to  be  something  that  has  a  set  of  arguments.  It'll  be 
defined  in  terms  of  the  top  level,  a  high  level  word,  with  parti- 
cular differentiating  arguments  from  the  meaning  of  that  high  level 
word.   And  it  too  will  be  a  predication  like  a  verb.  Except  of 
course  it  will  fit  into  the  argument  position  of  the  verb.  So  the 
conclusion  here  is  that  I  am  talking  about  a  good  deal  of  work 
that  is  in  progress  and  I  am  not  quite  sure  what  the  outcomes  of  it 
will  be.   I  think  at  this  time  it  is  really  qoite  hopeful,  because 
it  seems  to  be  a  fairly  clear  path  to  developing  a  methodology  for 
extracting  from  these  large,  large  sources,  carefully  put  together 
of  semantic  information,  that  which  we  can  use  for  question-answer- 
ing machines  -  machines  that  will  eventually  read  text  -  I  still  hope. 

-21- 


Figure    8 
"Book"    Definitions 


-22- 


Questions 

Jerry  Hobbs   iCity  College  of  N.  Y. ] : 

Concerning  the  decomposition  of  verbs,  I  guess  the  original 
motivation  for  Ogden's  construction  of  Basic  English  [19]  was 
reducing  1800  English  verbs  to  14  or  so? 
Is  his  work  useful  at  all  to  you? 
Simmons : 

I've  looked  at  that  again  and  again  and  again;  it  has  not  yet 
been  useful  to  me .   I  am  not  sure  whether  that  is  his  fault  or 
my  fault.   But  I  haven't  yet  been  able  to  get  use  out  of  it. 
Hobbs: 

Concerning  the  nouns,  is  a  reasonable  way  of  looking  at  nouns 
to  simply  say,  items  in  an  index,  or  entries  in  an  index,  which 
point  to  a  large  number  of  facts?   What  you  have  to  attempt  is 
to  organize  these  facts  into  various  clusters  of  facts  which 
are  arranged  in  some  sort  of  hierarchical  kind  of  order  accord- 
ing to  the  task  that  you  are  doing? 
Simmons : 

Sure,  but  isn't  that  the  same  thing  that  we  are  up  against  with 
verbs?   We  want  to  take  the  verb  and  transform  it  into  a  set  of 
assertions,  a  set  of  facts.   Similarly  we  want  to  take  the  noun 
and  transform  it  into  a  set  of  assertions,  but  sort  of  pick  up 
the  first  level  implications  of  its  meaning.   The  critical  ques- 
tion to  me  is  how  you  use  those  sets  of  assertions  to  understand 
text.   That  is  the  problem  of  deep  discourse  analysis. 
Bill  Martin  [MIT] : 

And  the  first  thing  that  is  interesting  to  me  is  what  a  similar 
path  we  have  been  down.   I  feel  like  I  have  done  exactly  what  you 
have  done  except  I  had  to  do  it  by  hand.   I  didn't  have  a  diction- 
ary.  I  pored  over  the  dictionary  at  night,  and  my  people  have 
been  poring  over  the  dictionary.   And  just  from  hearing  you  talk 
I  know  that  people  here  can't  get  an  idea  of  how  well  this  works. 
But  this  works  actually  quite  well.   It  is  very  informative.  And 
because  I  have  done  the  same  thing  I  can  ask  certain  questions 
where  I  did  things  differently  from  you  and  I  have  wondered  why. 


-23- 


For  one  thing,  in  the  case  of  'book'  —  in  the  case  of  verbs  too, 
I  brought  everything  to  a  single  level.   Rather  than  saying 
there  is  a  book  and  we  know  certain  things  about  that;  and  then 
there  is  a  handbook  under  that  and  there  are  certain  things  we 
know  about  that  which  are  not  true  of  books;  it  seems  to  be  at 
the  same  level  and  the  same  thing  with  the  verbs.  Rather  than 
saying  there  are  certain  things  about  'march'  which  have  nothing 
to  do  with  'move',  but  yet  they  take  some  general  properties 
from  move;  you  have  brought  it  all  back  to  the  same  level;  and 
that  is  one  thing  that  I  have  done  differently    I  wondered  why 
you  did  it  that  way.    Do  you  see  what  I  am  saying?   You  tend 
to  translate  'march'  into  'move'  plus  a  lot  of  arguments  rather 
than  put  things  directly  on  'march'  and  then  also  know  that  it 
points  up  to  'move'  and  some  of  the  properties  come  from  there 
and  some  are  directly  on  it. 
Simmons : 

Yes.   I  am  working  on  a  hypothesis  that  we  can  describe  the  verbal 
meanings  of  English  in  something  considerably  less  than  the  20,000 
verbs  that  English  has.   So  I  really  want  to  be  able  to  put  the 
bulk  of  my  effort  on  a  few  verbs  and  put  the  distinguishing 
features  on  the  lower  level  verbs  like  'march',  and  'walk',  and 
so  on;  and  carry  the  distinguishing  features  up   to   the  primitive 
model.   I  would  like  to  have  fifty  verb  models,  if  that  is  enough, 
and  carry  the  distinguishing  features  up  as  added  assertions.  So 
that  every  word  that  is  a  descendent  of  one  of  these  higher  order 
verbs  will  have  as  true  the  assertions  of  the  higher  order  verb 
plus  its  own  distinguishing  features. 
Martin : 

Yes.  That  forces  you  to  be  able  to  do  it  in  terms  of  models  and 
features;  that  never  allows  you  the  out  of  saying  that  in  fact  I 
would  be  better  off  just  to  put  a  more  or  less  description;  but 
a  much  more  ambitious  way  of  trying  to  solve  this  whole  problem. 
Simmons : 

I  think  the  difference,  of  course,  is  that  you  have  a  problem  and 
an  application  directly  under  your  fingers.  You  know  the  pragmatics 
of  what  you  are  doing.  We  are  in  the  more  theoretical  range  of 
trying  to  see  what  we  can  do  with  the  lexicon;  and  a  general  theme 
of  how  we  are  going  to  use  models,  rather  than  a  particular  appli- 
cation at  this  time.  I  think  that  would  account  for  the  differences, 

-24- 


References 

[I]  Grishman,  R. ,  Sager,  N.,  Raze,  C.  and  Bookchin,  B. ,  "The 
Linguistic  String  Parser. "  AFIPS   Conference   Proceedings 
Vol.  42;  AFIPS  Press,  Montvale,  N.  J.  1973,  pp.  427-434. 

[2]   Woods,  William,  "Procedural  Semantics  for  a  Question-Answer- 
ing Machine,"  Fall  Joint  Computer  Conference   Proceedings 
1968,  pp.  457-471. 

[3]   Woods,  W.  A.;  Kaplan, R.  W. ,   Nash-Webber,  B.,   "The  Lunar 

Sciences  Natural  Language  Information  System:  Final  Report." 
BBW  Report  No.  2  378,  BBN  Cambridge,  Mass.  June  19  72. 

[4]   Winograd,  Terry,   Understanding  Natural  Language, 
New  York:  Academic  Press,  19  72. 

[5]   Heidom,  George  E.,  "Natural  Language  Inputs  to  a  Simula- 
tion Programming  System,"  NPS-55HD,  Naval  Post  Graduate 
School,  Monterey,  Calif.  1972. 

16]   Sager,  Naomi,  et  al.  ,  "i\n  Application  of  Syntactic  Analysis 
to  Information  Retrieval,"  String  Program  Reports  No.  6, 
N.Y.U.,  Linguistic  String  Program,  April  1970. 

[7]   Harris,  Zellig  S.,  "Decomposition  Lattices,"  T.D.A.P. 
No.  70,  U.  of  Pa.,  Linguistics  Dept. ,  1967. 

IB]   Charniak,  Eugene  C. ,  "Toward  a  Model  of  Children's  Story 
Comprehension,"  AI  TR-266,  MIT,  Cambridge,  Mass.  1972. 

[9]   Schank,  Roger  C. ,  "Understanding  Paragraphs,"  Istituto  per 
gli  Studi  Semantici  e  Cognitivi ,  Castagnola,  Switzerland, 
19  74.   (Order  from:   Centre  di  Documentazione  della  Fonda- 
zione  Delle  Molle  per  gli  studi  linguistici  e  di  communica- 
zione  Internationale,  Villa  Barbariga,  30039   San  Pietro  di 
Stra. ,  Italy.)   See  also  [20]. 

[10]  Hendrix,  G. ,  "Preliminary  Constructs  for  the  Mathematical 
Modelling  of  English  Meanings,"  University  of  Texas, 
Department  of  Computer  Sciences,  Working  Draft,  April  19  75, 
(not  for  distribution) . 

[II]  Kucera,  H.  and  Francis,  N.,  Computational  Analysis  of 
Present-Day  American  English,  Brown  University  Press,  1967. 


-25- 


[12]    Olney,    J.,    Revard,    C. ,    and   Ziff,    P.,    "Toward   the   Development 

of   Computational  Aids    for   Obtaining   a   Formal   Semantic 

Description      of  English,"    SDC    Document      SP-2766/001/00 , 

October    1968. 
[13]    Inquiries    regarding    the    availability    of    the    Random  House 

Dictionary   of   the   English   Language    tapes    should  be   directed 

to  Mr.    Laurence    Urdang,    Managing  Editor,    Random  House,    Inc., 

501   Madison   Avenue,    Uev  York,    New   York    1002  2. 
[14]    Gove,    Philip   B.     (ed.)       Webster's   Third  New    International 

Dictionary      (unabridged).       G.    &   C.    Merriam  Co.,    Publishers, 

Springfield,    Mass.    1971. 
[15]    Schank,    Roger  C,    "The    Fourteen   Primitive  Actions    and   Their 

Inferences,"    Stanford  Artificial    Intelligence   Laboratory 

AIM-183,    March    19  73. 
[16]    Riesbeck,    C,    "Computer  Analysis    of  Natural   Language    in 

Context,"    Ph.D.    Thesis,    CS    Department,    Stanford,    1973. 
[17]    Rieger,    C,    "Conceptual   Memory,"    Ph.D.    Thesis,    CS    Department, 

Stanford,    1973. 
[18]    Goldman,    N.,    "The    Generation   of  English   Sentences    from   a 

Deep   Conceptual   Base,"    Ph.D.    Thesis,    Computer   Science 

Department,    Stanford   Univ. ,    Calif.    1973. 
[19]    Ogden,    C.    K.,    The   System  of   Basic   English,    Harcourt,    Brace, 

and    Co.,    New   York,    1934. 
[20]    Schank,    Roger  C.     (ed. ) ,    Conceptual   Information   Processing, 

North-Holland  Publishing   Co.,    1975    (in   press). 


-26- 


COMPUTERIZED  DISCOVERY  OF  SEMA.\1TIC  WORD  CLASSES  * 
IN  SCIENTIFIC  FIELDS 

Naomi  Sager 
Linguistic  String  Project,  New  York  University 


Abstract 

A  procedure  is  described  for  automatically  obtaining  the 
semantic  classes  in  a  science  subfield.   This  procedure  is 
based  on  statistical  ccocurrence  data  for  words  in  particular 
relations  in  the  text.   The  results  of  applying  this  procedure 
to  a  subfield  of  pharmacology  are  presented.     Its  use   for 
structuring  the  information  in  natural  language  texts  is 
discussed. 


In  placing  our  work  in  the  context  of  AI  research,  it  is 
helpful  to  distinguish  two  functions  of  language,  language  as 
a  live  medium  of  communication  between  human  beings ,  or  between 
a  human  being  and  a  machine,  and  language  as  the  major  means  of 
storing  and  transmitting  the  world's  knowledge.   Roughly  this 
is  the  distinction  between  the  spoken  and  the  written  word. 
Much  of  the  research  in  artificial  intelligence  has  been 
concerned  with  the  former,  either  in  the  form  of  robotics  or 
in  question-answering  systems,  in  which  the  natural  language 
processor  acts  as  an  interface  between  the  human  user  and  a 
structured  data  base.   The  data  base  has  not  been  in  question; 
it  has  for  the  most  part  been  entered  into  the  system  in  tabular 
form.   The  artificial  intelligence  task  has  been  to  interpret 
and  act  upon  the  natural  language  input  produced  by  the  user. 
We  have  been  concerned  rather  with  language  as  a  storage  medium, 
with  the  problem  of  structuring  a  data  base  which  is  given  in 
natural  language. 

These  two  different  foci  of  research  activity  may  be  illus- 
trated by  the  diagram  in  Figure  1.   Most  AI  work  in  natural 


Research  supported  in  part  by   NSF-OSIS  Grant  GN-39879 

-2  /- 


Figure  1 


REAL 

TEXTS,       ^ 

NATURAL 

LAlNlGUAGE 

PROCESSOR 

STRUCTURED 

DATA 

BASE 

NATURAL 
LAlNlGUAGE 
PROCES- 
SOR 

WORLD 

KNOWLEDGE 

k 

DOCUMENTS'" 

USER 

> 

A 

TABLES.     SCHEDULES. 

NUMERICAL 

language  processing  has  been  locateci  at  the  user  enc3,  whereas 
our  activity  has  been  located  at  the  data  or  document  source 
end.   Our  aim  is  to  bring  natural  language  data  bases  into  such 
a  form  as  to  make  them  amenable  to  processing  in  user-directed 
systems  for  question-answering  and  data  processing. 

As  an  example,  currently  there  is  a  great  need  in  the 
medical  community  for  computer  programs  that  could  process  the 
information  in  patient  records  for  evaluation  of  health  care 
and  for  clinical  research.   This  information  is  recorded  in 
natural  language,  in  dictated  reports  of  clinic  visits,  in 
hospital  discharge  summaries,  and  the  like.   Although  the 
contents  of  these  records  are  limited  and  repetitive,  their 
form  is  necessarily  in  natural  language  (except  for  certain 
parts,  such  as  laboratory  findings) ;  e.g.,  no  hospital 
administrator  would  dare  to  request  physicians  to  substitute 
a  multiple  choice  format  for  a  dictated  discharge  summary. 
The  question  is  thus  posed  as  to  whether  there  are  methods  for 
obtaining  the  equivalent  of  a  predetermined  format  from  the 
natural  language  material,  in  cases  where  it  is  either  impossible 
or  unacceptable  to  impose  an  a  priori  structuring. 

Another  area  of  application  is  the  scientific  literature. 
In  this   area,  again,  while  there  are  review  articles  every 
few  years  in  very  small  specialized  areas  of  scientific  knowledge, 


-28- 


no  ont;  br.s  yet  written  a  review  article  in  the  form  of  a  table. 
Nor  is  it  likely  that  that  is  going  to  happen  very  soon.   Yet 
the  exponential  growth  of  the  scientific  and  technological 
literature  and  the  pressure  to  obtain  and  use  available  results 
quickly  make   it  desirable  to  develop  computer  methods  for 
processing  the  contents  of  the  literature  store,  if  this  is 
at  all  possible.   There  are  sufficient  restrictions  in  the 
language  that  is  used  in  these  areas  to  suggest  that  natural 
language  processing  methods  can  be  used  to  extract  structures 
that  would  give  us  the  equivalent  of  a  structured  data  base. 
We  have  been  working  in  this  area,  directing  our  attention 
to  relatively  small  subject  matter  areas,  though  these  are 
relatively  large  compared  with  the  areas  that  have  so  far  been 
treated  by  computer.   We  have  worked  for  several  years  on  texts 
in  a  subfield  of  pharmacology  having  to  do  with  the  cellular 
level  action  of  digitalis,  one  of  the  main  drugs  used  in  cardiac 
therapy.   The  key  to  the  initial  success  we  have  had  in  obtain- 
ing a  structuring  of  natural  language   material  is  the  fact 
that  the  pharmacologist  in  the  case  of  the  digitalis  texts 
(and  the  physician  in  the  case  of  medical  records)  is  not 
writing  in  English  seen  as  a  whole  language  but  in  effect  in 
a  sublanguage   of  English.   The  limitations  as  to  what  can  be 
sensibly  said  within  the  given  field  show  up  as  regularities 
in  the  linguistic  records  of  the  material.   One  can  then  view 
the  problem  of  finding  appropriate  data  structures  as  one  of 
first  discovering  the  grammar  of  the  sublanguage   of  the  given 
scientific  area. 

In  contrast  with  the  grammar  of  a  whole  language  we  expect 
the  grammar  of  a  sublanguage   to  have  categories  which  correlate 
with  semantic  categories  in  the  material;  that  instead  of 
obtaining  such  general  classes  as  noun,  or  noun  singular, 
noun  plural,  etc.   we  would  obtain  subclasses  on  a  grammatical 
basis  which  would  represent  the  objects  and  relations  of  interest 
in  the  science.   In  the  digitalis  work  that  will  be  summarized 
presently   it  turned  out  that  this  is  indeed  the  case.   We  used 
linguistic  methods  to  obtain  the  grammar  of  the  sublanguage. 


-29- 


and  the  resulting  grammatical  classes  were  found  to  correspond 
directly  to  the  semantic  classes  of  the  field,  as  recognized 
by  a  pharroacologist  working  in  the  field. 

One  of  the  reasons  that  this  task  can  be  viewed  as  a 
grammar-writing  problem  is  the  fact  that  scientists  in  respect 
to  their  fields  display  linguistic  behavior;  that  is,  certain 
sentences  are  acceptable  and  certain  sentences  are  not  acceptable 
within  their  field,  regardless  of  the  truth  or  falsity  of  the 
sentences.   It  is  just  a  question  of  whether  they  are  possible 
sentences  within  the  field.   So,  for  example,  "the  influx  of 
sodium"  or  "sodium  flows  into  the  cell"   is  an  acceptable 
sentence  in  pharmacology.   However,  "the  cell  flows  into  sodium" 
is  just  as  readily  rejected  by  the  pharmacologist  as  an  educated 
speaker  of  English  would  reject  an  ungrammatical  sentence  in 
the  English  language.   It  is  this  kind  of  linguistic  behavior 
which  is  captured  in  the  form  of  a  grammar  of  the  science 
sublanguage . 

How  is  such  a  grammar  discovered?   Observe  in  Figure  2  that 
on  the  input  side  we  have  subfield  texts.   There  are  two  stages 
of  syntactic  analysis,  followed  by  two  steps  of  statistical 
analysis,  the  result  being  a  set  of  word  classes  that  correspond 


Figure  2 
Obtaining  Information  Structures  from  Subfield  Texts 


SUBFIELD 
TEXTS 


STRING 
PROGRAM 


TRAWSFOR- 
^^JMATIONAL 
ANALYSIS 


SIMILARITY 
CALCULATION 


idil 


WORD    CLASS    PATTERNS 
RECOGNIZED    IN 
SENTENCE    TREES 


CLUSTER- 
ING 
PROGRAM 


SUBFIELD 


PATTERNS 
COMPLETED, 
CLASSES 
AUGMENTED 


WORD 
CLASSES 


SUBFIELD 


FORMATS 


30- 


to  the  semantic  classes  in  the  subfield.    Then  we  have  two  steps 
of  pattern  recognition,  the  last  one  of  which  is  the  human  stage 
of  post-editing  and  putting  everything  together.   The  result  is 
what  we  call  a  subfield  format,  a  kind  of  structuring  that 
represents  the  types  of  information  in  successive  sentences  of 
the  text.   We  will  go  over  this  step  by  step. 

Consider  first  the  two  stages  of  syntactic  analysis.   These 
are  done  by  the  Linguistic  String  Parser  [1,2,3]  which  performs 
a  segmentation  or  a  surface  analysis  of  the  sentence  followed 
by  a  second  stage  of  transformational  analysis.   The  type  of 
segmentation  obtained  using  string  analysis  (illustrated  in 
Figure  3)  greatly  simplifies  the  following  stage  of  deep  struc- 
ture analysis.   The  output  parse  produces  a  great  deal  more 
grammatical  inforination  about  every  component  in  the  sentence 
than  is  seen  in  Figure  3,  but  for  present  purposes  the  important 
thing  is  the  segmentation  that  is  obtained.   In  this  simple 
sentence,  "This  results  from  the  slowing  of  the  influx  of 
potassium  into  the  cell" ,  "this"  appears  with  an  indication 
that  it  is  a  referential  to  some  previous  item  in  the  text; 
"results  from"  has  as  its  object  another  segment,  whose  verb  is 
"slowing"  and  whose  object  is  "the  influx  of  potassium  into  the 
cell''.   Notice  that  each  segment  of  this  nest  of  embedded 
structures  contains  a  verb  or  verb-like  element.   In  the  first 
line  "results  from"  is  a  tensed  verb.   In  the  second  line 
"slowing"  is  a  verb  with  an  "ing"  suffix,  making  it  nounlike. 

Figure  3 
Parse 

This  results  from  the  slowing  of  the  influx  of 
potassium  into  the  cell. 

1.  THIS  (  )        RESULTS      FROM  2. 

2.  THE  SLOWING      OF  3. 

3.  THE  INFLUX    OF  POTASSIUM 

INTO  THE  CELL 


-31- 


And  in  the  third  line  we  have  "influx",  which  is  morphologi- 
cally a  noun  but  is  related  to  "flow  into". 

This  hierarchy  of  verbs,  which  is  obtained  explicitly 
by  transformational  analysis,  can  be  represented  as  a 
dependency  tree,  as  shown  in  Figure  4.   It  turns  out  that  in 
the  parses  there  is  already  a  semantic  separation  of  what  you 
might  call  the  object  language  of  the  science  from  the  meta- 
language.   The  lower  parts  of  the  parse  tree  contain  the 
events  and  the  objects  of  interest  in  the  field,  and  the 
upper  parts  of  the  parse  tree  the  human  relations  to  these 
events,  in  such  expressions  as  "it  was  assumed  that",  or 
"we  observed  that".   Within  the  lower  range  there  is  a  hierarchy 
of  verbs.    The  lowest  level  here  are  verbs  of  qualitative 
events ;  then  come  certain  quantitative  verbs  and  then  certain 
causal  or  sentence  connecting  verbs .    This  verb  hierarchy  can 
be  made  explicit  by  transformational  analysis,  based  on  the 
Harris  theory  of  transformations.   In  a  tree  of  this  type  each 
verb  is  seen  to  have  certain  arguments;  at  the  lowest  level 
the  verb  has  nouns  as  its  arguments,  e.g.  "flow  into"  with 
its  arguments  "potassium"  (subject)  and  "cell"  (object) . 
This  is  operated  upon  by  "slow",  operated  on  in  turn  by  "results 
from"  which  also  has  another  argument  of  a  sentential  type. 


Figure  4 
Transformational  Tree 

RESULT  <FROM> 

THIS  (  )  SLOW  <ING> 

FLOW  <UX><INTO> 

/\ 

POTASSIUM   CELL 


-32- 


This  type  of  output  from  the  first  two  stages  of  syntactic 
analysis  leaves  us  a  tree  on  which  we  can  do  statistical  cluster- 
ing operations  to  obtain  the  noun  and  verb  classes.   The  trick 
here  is  to  cluster  by  words  that  are  similar  in  their  grammatical 
position  within  the  tree,  i.e.,  have  the  same  word  as  their 
operator  or  argument.   For  example,  we  read  off  from  the  tree 
in  Figure  4  (and  several  others)  certain  word  triples,  as 
shown  in  Figure  5.   At  the  top  of  Figure  5  there  are  two 
instances  of  "move"  and  one  of  "flow",  all  of  them  having 
"potassium"  as  their  first  argument,  or  subject.   "Flow"  and 
"move"  are  similar  in  several  respects  here.   First  of  all, 
they  have  the  same  word  as  their  first  argument;  that  is,  they 
have  a  similarity  in  the  transformational  trees  by  having  the 
same  word  in  the  same  argument  position.   Nouns  also,  e.g. 
"potassium"  and  "sodium" ,  are  found  to  be  similar  by  the  same 
type  of  criterion.   That  is,  they  have  the  same  verb  as  their 
operator  while  they  are  in  a  given  argument  position. 


Figure  5 
SIMILARITY  COEFFICIENTS 

FLOW  <UX><INTO>        POTASSIUM  CELL 

MOVE  <MENT><ACROSS>   POTASSIUM  MEMBRANE 

MOVE  <MENT><INTO>     POTASSIUM  CELL 

MOVE  similar  to  FLOW 

FLOW  <UX><INTO>       POTASSIUM  CELL 

FLOW  <UX><OUT  0F>     SODIUM  CELL 

POTASSIUM  similar  to  SODIUM 


-33- 


We  are  now  in  a  position  to  compare  quantitatively  the 
similarity  of  occurrences  of  words.   We  have  seen  qualitatively 
how  words  in  particular  grammatical  classes  are  similar  in  a 
tree;  now  we  will  do  it  quantitatively.   We  will  compare  every 
pair  of  words  in  the  entire  corpus  with  respect  to  six  possible 
grammatical  relations.   For  example,  in  Figure  6,  we  see  the 
data  for  "flow"and  "move",  taken  from  Figure  5.   "Move"  had 
two  occurrences  with  "potassium"  as  subject.   "Flow"  has  one 
such  occurrence.   The  calculation  is  made  on  a  very   large 
matrix  with  six  U   entries  for  N  words  [4].   Each  word  is 
assigned  a  characteristic  (normalized)  vector  based  on  tables 
of  the  type  illustrated  in  Figure  6,  showing  the  amount  of 
"similar"  occurrences  the  word  has  vis-a-vis  all  other  words. 
We  obtain  a  similarity  coefficient,  that  is,  a  numerical 
measure  of  how  similar  each  two  words  are  in   our  text  corpus, 
by  simply  doing  a  vector  multiplication. 


Figure  6 
COMPUTING  WORD  SIMILARITIES 


WORD  PAIR  FREQUENCIES 

POTASSIUM 
AS  SUBJECT 

FLOW 

MOVE 


POTASSIUM 
AS  OBJECT 


CELL 

AS  OBJECT 


1 

0 

1 

2 

0 

1 

1 

ALL  WORD 
PAIRS 

EACH  PAIR 
IN  6 

POSSIBLE 
GRAMMATICAL 
RELATIONS 


NORMALIZED  FLOW  VECTOR   F 


(6n  DIMENSIONS) 


NORMALIZED  MOVE  VECTOR   M 

SIMILARITY  COEFFICIENT  =  F-M 


-34- 


We  thus  obtain  a  number  that  represents  the  similarity  of  any 
two  words  in  the  corpus,  as  illustrated  in  Figure  7.   The 
slashed  squares  contain  similarity  coefficients  that  are  above 
threshold.   The  threshold  here  is  .27.   Every  i,j  square 
contains  the  calculated  similarity  between  the  two  words  in 
the  ith  row  and  jth  column.   We  expect  words  that  have  a 
similarity  coefficient  greater  than  the  threshold  to  form  a 
cluster  by  a  clustering  algorithm,  as  illustrated  in  Figure  8. 

Figure  7 
SIMILARITY  COEFFICIENTS 


K 

■ 

POTASSIUM 

SODIUM 

ELECTROLYTE 

ION 

CALCIUM 

K 

mvM 

.151 

.075 

.110 

.220 

POTASSIUM 

v//m. 

v//m^w///} 

.]47 

VMM/ 

SODIUM 

.148 

/449// 
/  /  /  // 1 

W/  /// 

ELECTROLYTE 

.072 

if// 

ION 

mm 

CALCIUM 

Threshold  =  .27 


Figure  8 
WORDS  WITH  HIGH  SIMILARITY  COEFFICIENT  FORM  A  CLUSTER 


ION 


SODIUM 


CALCIUM 


POTASSIUM 


ELECTROLYTE 


-35- 


In  Figure  8  we  have  replaced  the  slashed  squares  of  Figure  7 
by  lines,  showing  that  the  words  that  have  a  similarity  coeffi- 
cient higher  than  the  threshold  form  a  cluster.   And  as  you  can 
see,  the  cluster  of  Figure  8  is  of  ion  words  but  they  are  not 
all  ion  names;  the  cluster  contains  ion  names  plus  classifica- 
tions of  ion  names.   They  form  a  cluster  because  of  their 
similar  type  of  occurrence  in  the  textual  material.   The  cluster- 
ing algorithm  operates  by  adding  one  word  at  a  time  to  a  started 
cluster.    A  word  is  added  to  a  cluster  if  the  average  of  its 
similarity  coefficient  to  all  of  the  others  in  the  cluster  is 
above  threshold.   (Details  are  given  in  [4].) 

I  would  like  to  show  you  the  results  of  the  experiment  that 
we  have  done  with  the  digitalis  material.   This  experiment 
involved  about  400  sentences  from  a  much  larger  corpus,  taken 
from  4  or  5  texts,  not  selected  in  any  particular  way,   except 
that  they  were  part  of  the  set  of  digitalis  texts  that  we  had 
analyzed  manually  in  an  earlier  study.   We  will  look  at  some  of 
the  clustering  output  in  detail,  to  demonstrate  that  the 
clusters  are  semantically  coherent,  so  much  so  that  we  are  able 
to  name  them.   I  should  mention  that  the  clustering  program 
generates  overlapping  clusters  and  then  we  have  a  simple  merging 
algorithm  that  merges  those  overlapping  clusters  that  have  a  2/3 
intersection.   Figures  9  and  10  show  the  output  from  the  merging 
algorithm,  i.e.  the  merged  clusters  from  a  particular  run. 
CG  stands  for  "cardiac  glycoside";  the  words  that  fell  into 
the  merged  cluster  labeled  CG  are  with  one  exception  cardiac 
glycosides  or  drug  classifiers.    There  is  one  interesting 
exception;  the  erythrophleum  alkoloids  are  not  glycosides,  but 
they  are  a  set  of  related  componds  which  in  a  particular  text 
that  we  were  analyzing  happened  to  be  compared  with  the 
glycosides  and  therefore  they  fell  into  the   CG  cluster. 
The  output  shows  that  there  is  a  certain  semantic  coherence 
to  the  class  generated  by  the  clustering  program.   it  doesn't 
mean  that  the  machine  is  perfect  and  will  make  a  perfect 
semantic  class,  but  it  does  not  put  unrelated  words  into  one 


36- 


Figure    9 


CLUSTERING  OUTPUT 

NOUN  CLASSES 


Run    11  .13.74 
T    =    0.27 


CG    CLASS 

CG  INHIBITOR 

DIGITALIS  AGENT 

OUABAIN 

DRUG 

STROPHANTHIDIN 

STROPHANTHIDIN    3    BROMOACETATE 

STROPHANTHIN 

CARDIOTONIC    GLYCOSIDE 

COMPOUND 

INHIBITOR 

ERYTHROPHLEUM    ALKALOID 

CATION  CLASS 


CALCIUM 

CA+  + 

CA 

POTASSIUM 

K 

SODIUM 

NA+ 

ION 

ELECTROLYTE 

GLUCOSE 


ION 
K+ 


ENZYME  CLASS 


NA+K+  ATPase 

ENZYME 

ATPase 


PROTEIN  CLASS 

FIBER      PROTEIN 
PROTEIN    ACTOMYOSIN 
CARDIAC 

SR  CLASS 

S  R 

SARCOPLASMIC    RETICULUM 

MUSCLE  CLASS 


MUSCLE 
HEART  MUSCLE 


MUSCLE 
VENTRICLE 


CELL-TISSUE  CLASS 


CELL 
MYOCARDIUM 


ADP   CLASS 


ADP 
El 


SPACE  CLASS 


SPACE 
MILIEU 


-37- 


Figure    10 


CLUSTERING  OUTPUT 

VERB  CLASSES 


RUN  11  .13.74 
T  =  0.27 


V,(^(MOVE)  V(;^|(CONTAIN)     V^(EXCITE) 


MOVE 

TURNOVER 

INTRA 

EXTRA 

CONCENTRATE 

FLOW 


CONTAIN 
LOSE 


EXCITE 
DEPOLARIZE 


Vq(INCREASE) 


INCREASE 
DECREASE 
CHANGE 


AUGMENT 
INCREASE 


V55  (CAUSE) 


CAUSE 

INDUCE 

PRODUCE 

LINK 

INTERFERE 

ACT 

AFFECT 

RELATE 

TOXIC 


INFLUENCE 

INHIBIT 

STIMULATE 

AFFECT 

CONCENTRATE 

ACT 

REVERSE 

INDUCE 

TOXIC 

PENETRATE 


DEMONSTRATE 

SIMILAR 

CAUSE 

RELATE 

DUE  TO 

PRODUCE 

LINK 


OPPOSE 

DIVERGE 

SIMILAR 


EFFECT 
PRODUCE 


DISSOCIATE 
RELATE 


Vc 


REPORT 
OBSERVE 


(UNNAMED) 


TAKE 
TREAT 


TRANSPORT 
EXCHANGE 


AUGMENT 
IMPROVE 


REDUCE 
INFLUENCE 


DECREASE 
MEASURE 


-38- 


class,  because  it  is  based  on  distributional  similarities  of 
words.   It  should  be  noted  that  the  word  classes  in  Figures  9 
and  10  are  not  selected  from  the  data.   They  show  the  complete 
computer  output  for  the  run  in  question. 

The  verb  classes  shown  in  Figure  10  are  interesting.   Since 
the  nouns  have  been  clustered  with  respect  to  the  verbs  and  the 
verbs  with  respect  to  the  nouns  we  have  labelled  some  of  the 
verb  classes  in  terms  of  their  relational  status  to  nouns.  For 
example,  V    are  the  verbs  that  related  ion  words  to  cell  words. 
Notice  that  they  are  verbs  of  motion,  or  of  arrested  motion. 
"Moves",  "turnover",  "concentrate",  "flow".   ("Intra",  "extra" 
are  treated  as  verbs.)   The  V^j    are  just  the  reverse;  they 
relate  cell  words  to  ion  words,  e.g.  "contain"  in,  "the  cell 
contains  sodium".   Then  we  have  a  level  of  quantitative  verbs: 
"increase",  "decrease",  "change"   "augment",  "increase" .  The  paral- 
lel columns    indicate  that  the  clusters  were  not  merged  by 
tiie  merging  algorithm.   Since  the  merging  algorithm  requires  a 
2/3  overlap  there  was  no  merging   of  pairs;  the  last  step  of 
putting  these  clusters  together  was  done  by  simple  alignment. 
Notice  that  the  computer  has  distinguished  six  different  kinds 
of  sentence  connecting  clusters.   We  don't  know  as  yet  what 
this  really  represents.   When  we  did  a  manual  analysis  of  this 
material  we  put  all  of  what  seemed  to  be  causal  verbs,  the 
relational  verbs   between  events,  into  a  single  class,  e.g. 
"cause",  "induce",  "produce",  "link",  "interfere",  "act", 
("act"  is  "act  on",  and  "toxic"  is  "toxify"  in  this  case), 
"influence",  "inhibit",  "stimulate",  and  so  forth.   All  of 
these  represent  relations  among  events  in  the  physiological 
world,  in  particular,  among  events  that  are  initiated  by  the 
drug  actions.   And  as  you  can  see  there  are  nevertheless 
differences  recognized  by  the  computer  program,  which  would  be 
very  interesting  to  explore;  the  clustering  algorithm  saw  the 
members  of  these  subgroups  as  being  more  closely  associated 
with  each  other  than  with  the  words  in  the  other  subgroups. 
[Question]   Is  toxic  a  verb? 
{Answer]     No.   But  in  the  verb  tree,  predicates  are  treated 


-39- 


as  verbs.   In  this  case:   Digitalis  had  a  toxic  effect.   It 
toxified  something  and  "toxic"  was  treated  as  a  verb. 

To  be  complete,  we  have  to  check  what  was  missed  by  the 
algorithm  as  well   as  what  was  obtained.   We  did  a  control 
study  by  analyzing  the  same  material  manually,  obtaining  the 
semantic  classes  and  formulas  by  observation  and  by  doing  a 
rough  distributional  analysis.   We  checked  these  classes  with 
a  pharmacologist.   Then  we  compared  the  manually  obtained 
classes  with  the  computer  output.   It  turned  out  that  in  each 
of  the  given  categories   the  computer  classes  contained  all 
of  the  high  frequency  occurring  elements  that  were  in  the  manual 
classes.    For  example,  in  Figure  11,  the  manually  obtained 
cardiac  glycoside  class  contains  the  same  words  that  we  got  in 
the  computer,  plus  some  others,  which  have  frequencies  of  occur- 
rence that  taper  down  toward  singular  occurrence.   The  computer 
was  able  to  collect  all  of  the  similar  words  which  had  a  high 
frequency  and  missed  those  which  were  of  low  frequency.  In  an 
application,  this  situation  could  be  corrected  by  a  larger  corpus 
or  augmenting  the  computer  generated  class  (treated  as  a  nucleus) 
manually. 

A  similar  result  is  seen  in  Figure  12  with  the  cation  class. 
The  computer  obtained  96  percent  of  the  cation  occurrences  which 
were  in  the  manual  class,  but  missed  a  few  of  the  others  which 
sould  have  fitted  naturally  into  the  class. 

I  would  like  to  refer  back  to  the  diagram  in  Figure  2  for 
just  a  moment.   We  have  obtained  so  far  the  two  steps  of  syntactic 
analysis,  and  the  two  steps  of  statistical  analysis  which  gave  us 
the  word  classes.   Now  we  want  to  go  back  into  the  sentence  and 
obtain,  with  these  classes,  the  patterns  of  information  structures 
that  we  were  after  to  begin  with.   It  should  be  clear  from  the  way 
the  trees  were  built  and  the  way  the  words  were  clustered  from 
the  trees  that  once  we  have  the  word  classes  and  we  plug  them 
back  into  the  sentence  in  the  positions  from  which  they  were 
clustered  we  will  get  patterns  of  word  class  occurrence.  By  so 
doing,  we  will  get  either  fragmentary  or  complete  patterns 


-40- 


COMPUTER 

CG 

DIGITALIS 

OUABAIN 

DRUG 

STROPHANTHIDIN 

STROPHANTHIN 

STROPHANTHIDIN 
3  BROMOACETATE 

CARDIOTONIC  GLYCOSIDE 

COMPOUND 
INHIBITOR 


Figure  11 
CG  CLASS 

MANUAL 

CG 

DIGITALIS 

OUABAIN 

DRUG 

STROPHANTHIDIN 

STROPHANTHIN 

STROPtlANTHIDIN 
3  BROMOACETATE 

CARDIOTONIC  GLYCOSIDE 

COMPOUND 
INHIBITOR 


NO.  OF  TEXTUAL 
OCCURRENCES 


88% 


ERYTHROPHLEUM  ALKALOID  6 


RUN  n  .13.74 
T  =  0.27 


GLYCOSIDE 

AGENT 

DIGOXIN 

ACETYL  STROPHANTHIDIN 

CARDIOACTIVE  GLYCOSIDE 

DIGITALIS  GLYCOSIDE 

DIGITOXIGENIN 

STROPHANTHOSIDE 

CARDIAC  GLYCOSIDE 

DIGITOXIN 

STROPHANTHIN  K 

DIGITALIS  COMPOUND 


n 

8 
7 
7 
6 
6 
3 
2 
2 
1 
1 
1 


442 


-41- 


COMPUTER 

CALCIUM 

POTASSIUM 

SODIUM 

CA++ 

CA 

K 

ELECTROLYTE 

ION 

NA+ 

GLUCOSE 


Figure  12 
CATION  CLASS 

MANUAL 

CALCIUM 

POTASSIUM 

SODIUM 

CA+  + 

CA 

K 

ELECTROLYTE 

ION 

NA+ 

CATION 
MG"^* 

MAGNESIUM 
NA 


NO.  OF  TEXTUAL 
OCCURRENCES 


loA 

90 

53   / 

48  I 

30/    56% 

29 

17 

M 

^0 

7 

6 

3 

3 

3 

3 

412 


RUN  n  .13.74 
T  =  0.27 


-42- 


depending  on  how  much  is  present  in  each   sentence.   At  this 
point  some  human  intellectual  work  is  necessary  to  put  together 
fragments  into  more  complete  structures.   We  obtain  formats 
which  seem  to  account  quite  regularly  for  the  repeated  content 
in  many  text  sentences,  and  as  such  have  been  called  "information 
formats"  [5,6]. 

In  Figure  13,  we  show  the  information  format  for  the  sentence 
that  we  began  with  (in  Figure  3).   The  word  'this'  takes  up  a 
whole  unary  line.  The  conjunction  here  is  the  sentence  connecting 
verb,  "results  from",  leading  to  the  next  line,  which  contains 
2  levels  of  embedding:   the  innermost  part  is  an  elementary 
sentence,  "Potassium  flows  into  the  cell,"  operated  upon  by  the 
verb  "slows".   This  is  really  just  the  same  sentence  that  was 
seen  before,  but  it  is  a  case  now  of  a  type  of  pattern  which  we 
recognize  in  very  many  sentences  of  the  text.   The  format  for 
this  rather  complicated  field  of  the  cellular  action  of 
digitalis  has  a  very  repetitive  structure  when  seen  in  terms  of 
the  types  of  information  contained  in  factual  portions  of  text 
sentences . 

Figure  13 
Format  of  S^ 


Q 


Ni 


P  N' 


D, 


CONJ 


5.1  ^- 


5.2  SLOWS 


•{^.}' 


THIS 


POTASSIUM    INFLUX   INTO  THE 

CELL 


RESULTS 
FROM 


SENTENCE:  THIS  RESULTS  FROM  THE  SLOWING  OF  THE  INFLUX  OF  POTASSII 
INTO  THE  CELL. 


-43- 


Information  formats  do  not  characterize  the  argument  from 
sentence  to  sentence  within  a  given  text.   In  these  procedures 
we  are  trying  to  find  a  typology  for  the  kinds  of  information, 
the  major  classes  of  interest,  and  the  types  of  relations  that 
are  of  interest  within  a  field.   In  this  case  (Figure  14)  we 
find  that  the  repeating  pattern  is  one  where  there  is  some 
elementary  qualitative  sentence  of  either  cell  physiology 
or  biochemistry  at  the  deepest  embedded  level,  a  sentence  which 
has  a  concrete  verb,  and  concrete  nouns  as  its  subject  and  object. 
These  elementary  sentences  are  operated  upon  optionally  by  some 
quantitative  material;  and  this  in  turn  has  some  sequence  of 
causal  verbs  operating  on  it,  appearing  one  after  the  other  with 
an  initiating  drug  agent  as  the  final  subject.   We  tested  this 
format  by  going  through  a  number  of  articles   and  seeing  if  they 
could  be  formatted  according  to  this  pattern,  and  indeed  the 
large  majority  of  the  sentences  had  this  kind  of  pattern,  with 
different  detailed  sentences  within  it,  but  all  of  them  cases 
of  the  word  classes  in  particular  relations. 

A  few  last  words  about  the  implementation  of  the  components 
in  the  above  procedures.   The  linguistic  string  parser  has  been 
operative  in  one  programmed  version  or  another  since  1966  [see 
1,2,3  cited  above].   I  would  just  mention  those  features  of  it 

Figure  14 
PARTIAL  TEXT  FORMAT  FOR  PHARMACOLOGICAL  SUBFIELD 


No  Q 

^SS' 

V 

Vq 

Hi  q 

V  Q  (P)N2  PN3 

Ds 

CONJ 

• 

1 

1 

No  Q 

^SS' 

V 

Vq 

Nj  Q 

V  Q  (P)N2  PN3 

h 

1 

DRUG    DRUG      QUANTI-  ELEMENTARY  SENTENCE  OF   T   SENTENCE 
AGENT   ACTION    TATIVE   CELL  PHYSIOLOGY,  BIO-    /    CONNEC- 

CHANGE   CHEMISTRY,...         /    TIVE 

EXPERIMENTAL  AND 
CLINICAL  CONDITIONS 


-44- 


which  make  possible  its  application  to  semantic  problems  in  text 
analysis.   First,  it  has  a  very  large  and  detailed   grammar  of 
English,  absolutely  necessary  for  any  real  text  analysis. 
Secondly  it  has  an  extensive  treatment  of  coordinate  conjunctions 
and  comparatives,  also  without  which  you  cannot  do  very  much 
real  text  analysis.   Thirdly  it  now  utilizes  a  programming 
language  which  has  been  developed  here  for  writing  the  necessary 
grammatical  and  semantic  restrictions,  and  which  also  makes  the 
stating  of  inverse  transformations  quite  straightforward  [7]. 
The  implementation  of  the  current  parser,  the  programming 
language,  and  the  transformational  component  have  been  done  by 
Ralph  Grishman.   The  transformations  are  being  slowly  entered 
into  the  system  by  Jerry  Hobbs  [8].  The  treatment  of  coordinate 
conjunctions  is  the  work  of  Carol  Raze.   The  similarity  calcula- 
tion and  clustering  programs  [ref.  4  cited  above]  were  also 
implemented  by  Ralph  Grishman,  in  cooperation  with  Lynette 
Hirschman,  who  also  did  the  linguistic  analysis. 

In  the  study  that  I  just  described  to  you,  the  transformational 
component  was  not  rich  enough  to  handle  the  text  sentences  yet, 
so  the  trees  for  that  stage  were  manually  done.   The  trees  were 
drawn  by  hand  for  the  input  sentences,  using  only  such  transfor- 
mations as  we  are  now  implementing  in  the  system. 

Finally,  how  can  these  results  be  used?   What  I  have  been 
describing  so  far  is  a  discovery  procedure  for  data  structures 
rather  than  an  application   of  such  structures  in  an  actual  fact 
retrieval  question-answering  situation.   Nevertheless,  we  can 
sketch  how  some  of  the  components  would  be  used  in  an  applica- 
tion, illustrated  in  Figure  15.   Here  we  would  be  using  the 
components  shown  in  the  top  line  to  obtain  the  appropriate  data 
structures,  and  using  those  shown  in  the  bottom  line  to  process 
new  texts.   The  arrows  into  the  string  and  transformation  programs 
indicate  a  certain  amount  of  semantic  feedback,  using  sub field 
word  classes  to  resolve  syntactic  ambiguity.   Some  of  the 
problems  that  are  difficult  in  parsing  are  solved  when  you  have 
a  sublanguage  grammar.   Once  you  have  the  detailed  word  classes 
you  not  only  have  the  basis  for  stating  information  structures 


-45- 


CO 
UJ 
Q  (/I 
_l  CO 
LU  en 
►-I  _l 
li.  O 
CO 

m  en 
o 


un 


0) 
u 

•H 

fa 


>• 
UJ 
»— I 

q: 

I— 

LU 

oc 

ai 
o 
u. 

o 


LU 
O 

o 


nil 


ej 


LU  =t 

I-  oc 

c«o  o 

13  O 

_l  (X. 

O  Q. 


•a: 


o 
»— I 

I— 


I— ■      <_) 


<: 

o  00 

I  I—  to 
c/1  <:  >- 
z:  s:  _J 

cc  q:  «a: 
a:  o  ^ 


^ 


^ 

C3 

<c 

Z 

Q£ 

1— 1 

0 

q: 

0 

1— 

a: 

I/O 

Q. 

oo 

h- 

X 

LU 

1— 

Q 

_I 

LU  LU 

_J  >— 1 

a.  u- 

s:  en 

<!  :d 

CO  1/1 

C/1 

z:        CO 

q:        UJ 

LU           <_) 

f-      2: 

h-           LU 

<:       1- 

S. 

Q.    Q  Z 

\^ 

LU  LU 

^^ 

C/1    CJ  OO 

00  CJ 

•a:  «a:  h- 

-1  h-  X 

0          LU 

I— 

Q 

a:       2: 

0       •-. 

== 

1 
< 

1 

5 

2: 

0  I/O 

1—*  ►— t 

1   h—  00 

OO  <C  >- 

^  s:  _i 

ct  q:  et 

Qi  0  Z 

\-  u 

<l 

\ 

^1 

\ 

0          ct 

X 

Z          CC 

V 

I-.       e) 

Qi        0 

1-       cc 

00        a. 

tn 

t- 

3  X 

LU  LU 

2=  H- 

J 


-4fi- 


but  you  also  have  a  semantic  tool  for  resolving  ambiguities 
that  arise  in  parsing  texts,  as  illustrated  in  Figure  16.   In 
interpreting  the  sentence  in  Figure  16 ,  it  is  clear  here  that 
changes  in  the  ionic  milieu  of  cells  is  what  is  produced  by 
digitalis,  rather  than  that  the  cells  are  produced  by  digitalis. 
But  this  ambiguity  cannot  be  resolved  without  semantic 
constraints  that  are  really  quite  sublanguage-specific  (and 
are  not  universal  semantic  constraints  in  the  language) .   In 
this  case,  in  the  digitalis  sublanguage  the  verb  "produced" 
never  occurs  in  the  sublanguage  with  "cells"  as  its  argument. 
Therefore,  even  if  we  didn't  know  that  "digitalis"  was  the 
subject  of  "produces"  in  this  sentence   we  would  know  that 
there  is  no  such  format  fragment  as  "produce  cells".   The 
second  interpretation  is  far  less  likely  than  the  first  because 
it  does  not  fit  any  of  the  formats  that  have  been  established 
for  the  sublanguage. 

Figure  16 


SYNTACTIC  AMBIGUITIES  RESOLVED  BY 
SUBLANGUAGE  CONSTRAINTS 


Changes  in  the  ionic  milieu  of  cells  produced  by  digitalis  have 
been  known  for  many  years. 


1.  Digitalis  produces  changes      YES 

2.  Digitalis  produces  cells        NO 


-47- 


References 

[1]   Sager,  N.,  The  String  Parser  for  Scientific  Literature. 

In  Natural  Language  Processing,  R.  Rustin,  ed. ,  Algorithmics 

Press,  New  York,  1973. 
[2]   Grishman,  R. ,  The  Implementation  of  the  String  Parser  of 

English.   In  Natural  Language  Processing,  R.  Rustin,  ed. , 

Algorithmics  Press,  New  York,  1973. 
[3]   Grishman,  R. ,  Sager,  N.,  Raze,  C. ,  and  Bookchin,  B., 

The  Linguistic  String  Parser.   Proceedings  of  the  1973 

National  Computer  Conference,  AFIPS  Press,  1973,  pp.  427-434. 
[4]   Hirschman,  L.,  Grishman,  R. ,  and  Sager,  N.,   Grammatically- 
based  Automatic  Word  Class  Formation.   Information  Storage 

and  Retrieval,  in  press. 
[5]   Sager,  N.,  Syntactic  Formatting  of  Scientific  Information. 

Proceedings  of  the  1972  Fall  Joint  Computer  Conference, 

AFIPS  Conference  Proceedings,  Vol.  41,  AFIPS  Press, 

Montvale,  New  Jersey,  1972,  pp.  791-800. 
[6]   Sager,  N.,   The  Sublanguage  Technique  in  Science  Information 

Processing.   Journal  of  the  American  Society  for  Information 

Science,  Vol.  26,  1975,  pp.  10-16. 
[7]   Sager,  N.,  and  Grishman,  Ralph,   The  Restriction  Language  for 

Computer  Grammars  of  Natural  Language.   Communications  of 

the  ACM ,  in  press. 
[8]   Hobbs,  J.,  and  Grishman,  R. ,   The  Automated  Transformational 

Analysis  of  English  Sentences:  An  Implementation.   In  ms . 


-48- 


THE  OWL  CONCEPT  HIERARCHY 

William  Martin 
Massachusetts  Institute  of  Technology 

Abstract 

Some  of  the  design  principles  of  a  concept  hierarchy  are 
discussed.   The  use  of  this  hierarchy  in  the  problem-solving 
language  OWL  is  described,  and  a  simple  example  of  OWL  is 
presented. 


I  am  very  much  in  sympathy  with  the  first  speaker's  [Simmons'] 
work  and  approach.   However,  as  he  also  pointed  out,  we  are  a 
little  closer  to  a  background  of  trying  to  actually  write 
programs  that  will  do  a  particular  thing;  and  in  fact,  come  into 
this  as  people  used  to  writing  very  big,  complicated  programs 
rather  than  as  linguists.   And  that  probably,  as  he  suggested, 
accounts  for  some  of  the  differences  in  our  approach.   So  the 
first  thing  that  I  would  like  to  do,  since  I  am  not  going  to  be 
able  to  give  you  all  the  details  of  what  we  are  doing,  is  to 
indicate  some  of  the  points  of  philosophy  that  we  have  that 
might  be  different  from  what  you  might  have. 

One  thing  we  believe  is  that  representation  is  very  important. 
Many  of  the  theses  that  have  been  done  at  MIT,  I  think,  have  just 
said  "well,  it  doesn't  matter  too  much  how  you  represent  'red 
apple'  —  any  old  way  will  do  the  job"  and  obviously  there  are 
dozens  of  ways.   But  it  is  our  feeling  that,  if  you  want  to 
write  a  really  big  program,  the  size  of  the  program  you  can  write 
is  limited  by  a  certain  amount  of  complexity  that  you  can  deal 
with  and  comprehend.   And  the  right  representation  will  go  a 
long  way  towards  getting  rid  of  complexity.   You  can  make  a 
large  gain  there,  and  therefore  it  is  important  to  spend  a  lot 
of  time  figuring  out  how  to  get  the  right  representation. 

The  second  thing  is  that  the  representation  actually 
communicates  to  other  people  a  lot  about  the  right  ^jay  to  organize 
a  field  and  to  think  about  it;  there  is  a  lot  of  information 

-49- 


that  can  come  in  and  be  given  to  a  person  through  the  representa- 
tion.  Other  programming  languages  that  have  been  sort  of 
domain-specific  have  done  that  in  the  past,  and  I  think  fairly 
successfully  in  some  cases.    So  it  is  worth  spending  a  lot  of 
time  on  finding  the  right  way  to  represent  this  information 
inside  the  machine.   That's  one  thing  that  I  think  is  important. 
I  also  think  that,  in  fact,  progress  can  be  made  on  that,  which 
is  a  fortunate  situation. 

Another  thing  is  that  you  need  multiple  representations. 
There  is  not  going  to  be  just  one  way  of  representing  a  sentence 
which  will  take  care  of  all  the  different  kinds  of  processing 
that  you  want  to  do  on  it.   There  should  be  some  notion  of 
canonical  form,  if  it  is  really  possible,  yet  there  must  in  all 
probability  be  more  than  one  representation.   Furthermore,  there 
will  have  to  be  several  levels  of  representation:  a  surface,  an 
intermediate,  and  a  deep  representation,  and  these  don't  express 
themselves  in  quite  the  same  way.   The  intermediate  level  seems 
to  be  very  procedural  as  we  put  it  into  our  programs  whereas 
the  deep  level  tends  to  be  more  of  the  sort  of  thing  that  can 
be  used  nonprocedurally  in  pattern  matching  and  what-not.  The 
surface  level  is  oriented  towards  details  of  the  way  English 
is  set  up.  So  we  have  more  than  one  representation  and  they  are 
not  treated  in  the  same  way.   They  won't  fulfill  the  same  sorts 
of  functions. 

Another  thing  is  that  I  don't  see  any  reason  to  believe  that 
there  is  one  set  of  primitives  that  is  going  to  turn  out  to  be 
the  only  really  decent  way  to  think  about  the  world.   I  can 
believe  that  there  would  be  lots  of  different  sets  of  primitives. 
I  go  along  with  Whorf   that  different  tribes  might  actually  have 
a  slightly  different  way  of  arranging  their  thoughts  that 
works  pretty  well  for  their  purposes  and  that  we  might  come  out 
with  a  lot  of  different  sets  of  primitives.   There  may  be  some 
more  primitive  notions  but  there  does  not  have  to  be  one  vector 
space,  one  set  that's  it. 

Another  thing  that  we  believe  is  that,  in  thinking  about  this 
whole  issue,  pattern  matching  is  probably  much  more  important 

-50- 


than  being  able  to  do  logic  or  deduction,  or  even  executing 
procedures.   That  is,  if  we  design  a  language  that  has  a  lot 
of  subroutines  in  it  today,  those  subroutines  are  designed  so 
that  they  can  be  compiled  easily  and  run.   What  we  think  is 
important  here  is  that  those  subroutines  be  designed  so  that 
they  can  be  looked  at  by  the  computer  itself  and  understood 
and  read.   And  they  can  be  matched  against  various  possible 
uses  they  might  have.   Actually  executing  them  is  not  so 
important.   Very  long  chains  of  reasoning  have  never  been  too 
successfully  done  by  machines;  it  seems  that  many  programs  that 
work  have  been  able  to  classify  the  problem  as  one  that  they 
knew  the  answer  to  and  write  the  answer  down.   When  you  look 
at  it  this  way,  it  becomes  more  a  matter  of  being  extremely 
good  at  recognizing  something  as  a  problem  that  you  are  already 
familiar  with.    We  have  worked  a  lot  with  business  people  and 
doctors  and  a  little  bit  with  lawyers  and  have  come  to  the 
conclusion  that  the  experts  in  those  fields  do  that  --  they 
know  most  of  the  answer  plus  or  minus  certain  things.   What  they 
are  good  at  doing  is  figuring  out  why  a  particular  case  fits  in 
one  of  the  many  hundreds  or  perhaps  thousands  of  patterns  that 
they  have,  rather  than  doing  a  lot  of  deduction.   If  you  ask 
them  how  they  did  it,  they  can  supply  deduction  to  show  how  it 
could  have  been  done  that  way   but  in  fact  that  is  not  how  they 
get  there.   So  in  representing  their  knowledge  one  needs  something 
that  is  amenable  to  this  rather  than  to  deduction;  and  that  means 
you  would  sacrifice,  say,  efficiency  of  execution  in  order  to  get 
something  that  you  can  read  more  easily  if  you  are  a  computer. 

A  fourth  point  is  that  I  think  we  can  do  it  by  hand.   I  have 
just  been  involved  in  the  construction  of  an  extremely  large 
program  that  involved  about  fifty  man  years  of  work  by  Ph.D. 
mathematicians.   I  found  that  if  you  have  a  successful  strategy 
for  producing  a  program  that  does  something  that  people  want, 
there  is  a  great  deal  of  labor  available  that  is  very  intelligent 
and  that  will  be  applied  to  the  problem  (whether  you  want  it 
applied  or  not,  almost) .   If  you  come  up  with  a  really  winning  way 


-51- 


of  putting  knowledge  in  a  computer,  dozens  of  projects  will 
automatically  spring  up  all  over  the  country  to  do  that  because 
it  will  be  so  interesting  that  people  just  won't  be  kept  out  of 
it.   So  the  most  important  thing  is  to  find  a  successful  way  of 
doing  it,  and  even  if  it  is  a  job  like  making  the  encyclopedia, 
if  a  successful  way  of  doing  it  is  found  it  will  get  done. 
Therefore  I  don't  think  that  concentrating  too  much  on  making 
sure  that  it  can  be  automated  is  a  good  idea  because  when  you 
do  that  you  often  tend  to  try  for  an  oversimplification.   If 
you  automate  things,  you  really  want  there  to  be  only  11  primi- 
tives because,  as  Schank  has  said  in  some  of  his  writings,  if 
there  were  more,  then  think  of  all  the  work  you  would  have  to  do. 
And  I  feel  that  many  issues  come  out  in  rather  small  worlds. 
You  don't  have  to  get  too  big  a  set  of  things  in  before  you 
begin  to  have  to  cope  with  many  of  the  problems  that  have  been 
mentioned  here.   So  those  are  a  few  points  of  philosophy  I  have. 

We  have  a  language,  Ol'TL ,    which  we  are  attempting  to  design 
in  order   to  explore  our  points  of  philosophy.   I  can  write  down 
a  procedure  for  you  in  our  language  which  we  have  an  interpreter 
for,  but  I  would  also  like  to  point  out  a  few  other  things. 
One  is  that  we  had  a  certain  small  number  of  notions  that  we 
think  are  pretty  important.   One  of  them  is  specialization. 
We  put  a  lot  of  stock  in  the  notion  that  a  token  is  in  some 
sense  just  one  example  of  the  sort  of  chain  that  you  have: 
living  thing,  animal,  dog,  terrier,  Fido,  Fido2 f  etc.   And  a 
good  way  to  organize  the  information  is  in  fact   to  try  to 
set  up  some  kind  of  a  hierarchy  including  the  nouns,  the  verbs 
and  the  adjectives.   It  is  fairly  broad  but  not  very  deep.   And 
when  you  add  special  things  like  the  put  operation  in  the 
Winograd  blocks  world,  which  is  one  particular  kind  of  a  general 
put,   it  is  nice  to  have  some  way  of  noting  it  as  similar  yet 
different. 

So  we  have  been  playing  around  for  quite  a  while  with  this 
hierarchy;  it  is  not  easy  to  come  up  with  something  that  every- 
body believes  in.   And  I  just  want  to  give  you  some  idea  of 


-52- 


where  we  stand  here. 

One  distinction  we  are  making  is  that  we  would  like  this 
hierarchy  to  be  a  tree,  yet  something  can  be  both  a  block  and 
also  a  nuisance  in  our  block  world.   If  that  block  is  a 
nuisance  then  we  say  tiiat  it  is  primarily  a  block,  but  also 
characterize  it  in  a  secondary  way  as  a  nuisance. 

Now,  in  addition  to  that  characterization,  there  are  many 
that  are  based  on  relations;  in  particular,  the  semantic  cases. 
Somebody  is  an  agent  of  something  or  an  instrument  of  something. 
A  knife  would  be  an  instrument  because  it  can  be  used  in  some 
particular  activities  related  to  it.   There  are  parts  that  are 
for  example  a  top  of  something.   Such  a  part  is  characterized 
as  a  top  because  of  its  relation  to  something  else,  that  is  it 
is  the  top  of.   It  may  also  be  a  book  or  something  that  is 
being  used  for  the  top.   And  then  we  have  characteristics: 
bright,  color;   under  color  comes  red.    Time  and  location 
end  up  in  their  own  category;  they  seem  to  us  to  be  not  quite 
like  everything  else.   And  then  we  have  the  activities,  as 
aspects  of  things.   These  have  an  organization  which  is  similar 
to  what  we  saw  earlier  this  morning  [in  Simmons'  talk]. 
I  started  out  by  looking  at  particular  verbs.   Then  I  looked 
at  Miller's  verbs  of  motion.   I  concluded  that  this  seemed  to 
work  and  I  decided  to  say:  how  can  I  make  a  tree  for  the  whole 
thing,  without  getting  into  too  much  work.   And  then  I  had  the 
idea  to  get  Basic  English  out.   Now  the  thing  about  Basic 
English  is  we  only  have  a  few  verbs  because  Ogden  used  the 
nominal  form  for  all  other  activities.   He  doesn't  have  'hit' 
listed  as  a  verb,  but  as  a  noun.   And  this  is  just  kind  of  a 
trick  on  his  part.   He  can't  really  speak  without  having  a  lot 
of  nominal  forms  of  verbs  available  in  the  basic  language,  so 
I  converted  all  those  back  and  then  you  really  have  a  couple 
of  hundred  verbs  in  Basic  English,  not  just  6  or  8.   Working 
with  those  couple  of  hundred,  I  was  able  to  get  them  sorted  out 
fairly  well  into  some  hierarchy,  although  it  took  me  a  couple 
of  years.   It's  like  a  great  crossword  puzzle  with  lots  of 
possibilities.   But  this  is  the  kind  of  thing  that  we  are  doing. 


-53- 


I    am  not  even    claiming    to   have    the    final    answer   here. 

One    class    of   activities    is   sort   of    a    'come-and-go'; 
under   it   you  have    'go'    which   has   more    of   a   direction    to   it. 
'Stand'    is    another   activity,    although    obviously   in    a   different 
sense    than    come-and-go. 

So   we   have    this    kind   of   a  hierarchy   and  whenever   I    define    a 
new   procedure   or   anything    like   that,    everything   that    comes    into 
the   machine    is   placed   in    a   data  base,    and  everything   that   comes 
into   the;  data  base   is    categorized   underneath    its    appropriate 
thing.      So   if  BLOCK : 1   is    mentioned,    I    can   put    it    underneath 
BLOCK    in    the    data   base.      Now    the   way    this    would    actually   happen 
inside    tlie   machine    is    that   a   concept  would  be    represented   as    a 
list.      The   second  position    in    the    list    for  BLOCK: 1  would  be 
BLOCK,    which   we    think    of    as    sort   of   a   genus,    that    is    a   thing 
that    the    concept   is    a  kind   of,    or    a   token   of.       Following    that 
in    the    list   are    the    various   properties    of    the    concept.      So 
that    if    I   wanted   to   indicate    that   the    concept  had    the   property 
red   then   that  would  be   put   in    this    list   as   well.      So  we   have 
in    the    data  base    a   set   of  what  we    think    of    as    concepts   which 
are    actually    lists    like    this.       It   is   hierarchically   organized 
and   there    are   predicates    that    can   go   on   every   single    concept 
in  here.      So  we   might   know   some    things    about   ON-TOP    that  we 
don't  know   about   ON,    etc.       In   general,    predicates    are    true 
lonless    over-ridden   below.      So  that   is    how   everything   is    set   up  ; 
when   I   write    down    a   procedure   or   a   statement,    you  have    to 
realize    that   when    that   is    read   in    the    reader   has    to  put   it    into 
this   network. 

Suppose    I   wanted   to  write    in   OWN    the    classic   procedure   where 
I   have    a  block   A   and   block   B    and  my   goal   is    to  put   block   A   on 
top   of  block   B   with    the    A/inograd  hand    (Figure    1)  .       I    do   that 
because    then    I    don't   have    to  explain    the   problem   to  people    at 
MIT.        (I    am  not    saying    that    all    people    there    do    is    put  A    on    B. ) 
Somewhere    in    the   hierarchy    under   my    activities    I    am   going    to   have 
PUT;    ?out   now    I    want    to  have    a   special    form   of   PUT;    I    don't   plan 
to   be    able    to  write    down    a  putting   procedure   which    takes    into 
account    all    puts   which    you  will    ever   do,    so   I    am   going    to 


-54- 


[:PUT=(PUT  (LOCATION  BLOCK : 1  ((ON  TOP)  BLOCK : 2 ) ) ) 
(ARCHETYPE  PUT  :PUT) 
(OBJECT  :PUT  BLOCK:!) 
(AGENT  :PUT  PERSON:) 
(INSTRUMENT  :PUT  HAND:) 
(PART  (AGENT  :PUT)  HAND:) 
(LOCATION  :PUT  ((ON  TOP)  BLOCK: 2)) 

(PRINCIPAL-RESULT  :PUT  (LOCATION  (OBJECT  :PUT)   (LOCATION  :PUT) ) ) 
(METHOD  :PUT  (FIND  SPACE:)) 
(LOCATION  SPACE:  (LOCATION  :PUT) ) 
(ASSIGNMENT-FOR  SPACE:   (OBJECT  :PUT) ) 
(THEN  (FIND  SPACE:)  (GRASP  (OBJECT  :PUT) ) ) 
:HAND= (INSTRUMENT  ((GRASP  (OBJECT  :PUT))  THE)) 
(THEN  (GRASP  (OBJECT  :PUT))   (MOVE  :HAND) ) 
(DESTINATION  (MOVE  :HAiMD)  LOCATION:) 

(THEN  (MOVE  :HAND)   (BECOM_E  (PRINCIPAL-RESULT  :PUT)  )  ) 
(THEN  (BECOME  (PRINCIPAL-RESULT  :PUT) )   ((LET  GO)   (OBJECT  :PUT))) 
(Y-COORDINATE 
LOCATION : 
(PLUS 
2 

(Y-COORDINATE  (LOCATION  (SPACE:  THE))) 
(DIMENSION  (HEIGHT  (OBJECT  :PUT))))) 
(X-COORDINATE 
LOCATION : 
(X-COORDINATE  (LOCATION  (SPACE:  THE))))] 

Figure  1.   The  OWL  Procedure  PUT 

specialize  the  put  I'm  defining  here.   That  is,  I  am  going  to 
put  PUT  in  the  genus  position,  but  what  do  I  use  as  the  special- 
izer,  which  should  be  the  index  or  the  key  thing  about  it?  Well 
I'll  use  the  goal  of  the  putting.   So  I  am  going  to  make  PUT 
the  genus  and  the  specialization  will  be  the  goal,  that  the 


-55- 


location  of  one  block  must  be  on  top  of  some  other  block.  So 
this  particular  PUT  is  specialized  by  (LOCATION  BLOCK : 1  ((ON  TOP) 
BLOCK :2))    and  the  reader  is  going  to  put  it  under  the  general 
PUT  in  the  data  base.   That  is  a  particular  ON  TOP  there  so 
under  ON  TOP  in  the  data  base  I  am  going  to  have  ON  TOP  BLOCK : 2 ; 
similarly,  BLOCK :1  and  BLOCK: 2  will  be  under  BLOCK.   Thus  every- 
thing that  is  read  in  is  sorted  out  by  concept  under  these  various 
categories . 

Next  I  want  tp  give  the  arguments  of  the  procedure.   I  use 
the  semantic  cases  for  that.   I  don't  use  the  notion  of  patient 
because  I  save  that  for  the  deep  structure.   The  object  of  this 
put  is  in  fact   BLOCK: 1,   the  one  that  is  going  to  end  up  on  top 
of  the  other  block  when  that  action  is  over.   That  is  the  object 
of  the  action.   And  the  agent  that  can  do  the  put  is  going  to  be 
a  person.   That's  who  can  do  it. 

I'll  give  the  benefit  of  the  doubt  and  let  the  computer  be  a 
person  here,  because  it  is  going  to  have  a  personality  and  it  can 
do  all  this.   The  instrument  of  this  put  is  going  to  be  a  hand; 
actually  there  is  a  left  and  a  right  available.   So  we  have  to 
take  one  of  those  and  I  want  to  make  the  constraint  that  the 
hand  is  a  part  of  the  agent,  (PART  AGENT  HAInID:).   I  don't  want 
to  speak  of  any  old  hand  but  rather  of  the  hand  that  is  a  part 
of  the  agent;  it  is  the  hand  that  I  want  to  have  him  use.   Now 
why  did  I  use  hand  here  while  I  used  agent  there;  this  is  a  trick 
we  are  playing  around  with.   Whenever  I  call  this  procedure  I 
have  the  following  situation.   There  will  be  arguments,  and  then 
there  will  be  some  prerequisites  that  will  necessarily  have  to 
be  true  in  order  to  do  the  procedure,  and  finally  there  will  be 
a  method  for  carrying  it  out.   It  is  broken  down  this  way  in 
part  to  make  it  easier  to  read  and  to  modify  for  automatic 
programming  purposes.   If  you  have  read  Sussman's  thesis, 
you  will  see  how  he  uses  the  notion  of  prerequisites,  and  we 
get  kind  of  a  pattern  matching  notation  out  of  this  part. 

Suppose  I  call  this  procedure  by  writing  down  'put  A  on  top 
of  B'.   Well  I  have  a  put  here.   So  coming  down  on  my  chain  I 


* 

BLOCK:   designates  a  token  of  type  BLOCK. 


-56- 


look  for  PUT.   Now,  I  have  several  PUTs  here  and  I  am  looking 
for  one  which  will  be  a  method  for  doing  this  particular  kind 
of  put.   One  PUT  here  is  specialized  by  (LOCATION  BLOCK : 1 
((ON  TOP)  BL0CK:2)),  where  BLOCK : 1  is  the  object  of  it,  and 
((ON  TOP)  BL0CK:2)  is  the  location  that  comes  in.   So  the 
first  thing  that  I  have  to  do  when  I  am  looking  for  a  particular 
put  procedure  that  I  could  use  with  this  function  call  is  match 
whatever  occurs  in  the  specializer  of  the  call.   So  in  this 
case  I  have  to  match  the  object  and  I  have  to  match  the  loca- 
tion.  If  those  match,  then  I  try  to  match  the  rest  of  the 
arguments,  where  'match'  means  that  I  can  show  that  the  thing 
in  the  argument  position  can  be  characterized  as  the  thing 
that  the  argument  calls  for.   If  I  can  match  the  arguments, 
then  I  am  secure  enough  that  this  is  the  right  procedure  to 
be  willing  to  try  and  start  to  satisfy  the  prerequisites. 
Satisfying  a  prerequisite  in  this  case  might  mean  something 
like  getting  C  off  of  B,  that  is  changing  the  world  in  a  way 
that  I  wouldn't  necessarily  want  to  do  unless  I  was  pretty  sure 
that  this  procedure  was  going  to  work.   So  in  matching   the 
arguments,  I  am  not  allowed  to  change  the  world.   Matching  the 
prerequisites ,  I  am  allowed  to  change  the  world.   Finally  I 
carry  out  the  method.   So  that  I  am  easing  my  way  into  this. 
There  are  four  stages  of  the  method.   The  first  is  trying  the 
index,  then  matching  the  rest  of  the  arguments,  then  prerequisite 
changes  and  then  the  method.   When  I  am  matching  the  instrument 
here,  what  I  have  to  do  is  find  out  its  properties;  it  has  to  be 
a  hand,  but  not  only  that.   Since  our  data  base  is  completely 
backpointered,  when  I  make  the  statement  (PART  AGENT  HAInID:),  it 
shows  up  as  a  property  of  HAisID:.   The  hand  in  (INSTRUMENT  PUT: 
HA1>ID)   and  the  hand  in  (PART  AGENT  HAND:)  are  the  exact  same 
thing  inside  the  computer.   There  is  only  one  hand;  it  would  be 
a  part,  I  guess.   Under  that  would  be  this  particular  hand. 
If  you're  familiar  with  LISP  atoms,  you  can  think  of  expressions 
containing  HAND:  as  properties  of  HAND:.   So  this  part  of  agent 
here  is  a  property  of  HAND:.    When  I  get  a  hand  as  instrument. 


-57- 


I  check  to  see  that  it  is  part  of  the  agent.   By  writing  the 
expression  with  agent  I  am  saying  that  when  you  check  that  the 
agent  is  a  person  don't  bother  to  check  if  he  has  a  hand  which 
is  the  instrument,  because  there  is  no  back-pointer  on  PERSON: 
to  (PART  AGENT  HAinID  : )  .   Rather  it's  on  AGENT.   Don't  bother  to 
check  about  the  hand  because  the  hand  might  not  be  given;  don't 
worry  about  that.   But  if  you  get  the  hand,  then  you  have  to 
have  the  agent.    This  is  the  kind  of  issue  in  general  that  we 
play  around  with.   Now,  there  are  some  more  arguments  to  the 
procedure.   And  then  we  get  to  the  method  of  doing  it.   The 
first  thing  that  you  do  is  to  find  a  space.   So  the  first  step 
of  the  method  is  a  function  call  to  the  find  procedure  for  the 
space  to  put  the  block  in.   And  then,  after  you  do  the  FIND  SPACE 
you  grasp  the  object  where  by  object  I  am  referring  to  the 
object  of  the  procedure  that  you  are  doing,  and  the  OWL  reader 
will  make  that  explicit.   And  then  you  do  a  set  of  other  steps 
like  that.    This  space  in  turn  must  be  qualified  by  a  couple 
of  additional  statements.   I  want  to  say  that  the  location  of 
the  space  that  we  are  going  to  find  has  to  be  the  location  that 
is  given  in  the  arguments  to  the  procedure.   It  also  has  to  be 
a  new  space;  we  don't  want  to  find  some  space  we  already  knew 
about.  So,  here  I  am  calling  this  procedure,  FIND,  and  I  am 
specializing  that  by  the  argument  space,  and  I  am  modifying  that 
with  some  information  that  I  have  gotten  out  of  the  arguments 
to  put. 

I  will  give  you  an  idea  of  what  we  are  trying  to  accomplish. 
We  are  on  our  third  implementation  of  the  interpreter  now.   We 
have  scrapped  two.   We  have  a  Woods-type  parser  that  is  a  simpler 
version  of  his  parser;  it  is  not  as  clever  about  backing  up  off 
wrong  paths.   It  can  use  the  cases  that  I  have  mentioned  as 
arguments  and  their  semantic  types  along  with  a  notion  of  how 
prepositions  and  places  in  a  sentence  flag  various  cases.  We 
actually  use  a  procedure  definition  like  the  one  above  as  semantic 
information  in  order  to  parse  sentences.   So  that  it  will  parse 
"put  A  on  top  of  B'  into  (PUT (LOCATION  A ((ON  TOP  B) ) )  by  saying 


-58- 


well,  A  is  the  object  of  put,  let's  see  is  A  a  block;  well,  if 
we  are  going  to  use  this  PUT  it  has  to  be  a  block,  etc.    Then 
also,  the  interpreter  can  actually  run  this  procedure.   To  run 
such  an  abstract  procedure  it  has  to  be  quite  smart  and  the  way 
that  we  make  the  interpreter  smart  is  to  have  it  constantly  stop 
and  go  to  the  data  base  for  user  instructions.   We  write  a  lot 
of  procedures  to  help  the  interpreter  out  of  its  difficulties; 
in  that  way  it  can  be  made  smart  enough.   We  can  always  add  more 
—  we  don't  have  the  notion  of  a  kind  of  closed  structure  there 
and  the  interpreter  declares  all  of  its  stack  and  all  of  that 
into  the  data  base  so  we  can  have  protocols  to  work  on.   So  we 
interpret  OWL;  we  use  it  for  parsing;  and  also  in  the  case  of 
automatic  programming  another  program  can  come  through  and  read 
an  OWL  procedure  and  find  a  more  efficient  way  of  doing  it  for 
a  particular  problem.   For  example,  if  we  know  that  prerequisites 
are  always  satisfied,  we  can  leave  out  worrying  about  checking 
them  etc.  and  take  that  code  away.   Finally  the  machine  should 
be  able  to  explain  OWL  by  translating  it  back  into  English,  since 
OWL  is  not  that  far  from  English. 

Now  I  don't  believe  that  you  should  have  a  deeper  level  of 
interpretation  of  these  things  in  order  to  do  the  kind  of  reason- 
ing you  were  talking  about  a  while  ago.   I  do  believe  that  what 
you  have  to  do  in  that  case  is  run  back  up  the  tree,  you  have  a 
stack  of  models  very  similar  to  Simmons' ,  and  you  rely  on  those 
when  you  are  trying  to  get  the  essence  of  what  happened.  Yes 
indeed,  the  block  is  now  at  the  other  location,  because  put 
implies  causing  this  transition  to  occur  and  getting  the  goal. 
You  have  that  information  at  the  top  of  the  tree  but  you  are 
allowed  to  put  down  at  the  bottom  various  details  necessary  to 
get  things  done.   You  have  to  have  both.   It  is  two  different 
ways  of  dealing  with  the  thing  that  you  are  trying  to  accomplish. 


-59- 


DESIGN  OF  THE  UNDERLYING  STRUCTURE  FOR  A  DATA  BASE  RETRIEVAL  SYSTEM 

Stanley  R.  Petrick 
T.J.  Watson  Research  Center,  IBM  Corporation,  Yorktown  Heights,  NY 


The  REQUEST  system  is  a  data  base  retrieval  system 
in  which  natural  language  queries  are  processed  by  a  transfor- 
mational grammar.   Several  features  of  the  underlying  represen- 
tation produced  by  this  grammar  are  described,  and  some  examples 
are  presented.   The  extendability  of  this  system  is  considered, 
and  compared  with  that  of  alternative  representations  and  systems 


The  symposium  chairman's  abstract  for  this  seminar  divided 
natural  language  processing  systems  into  two  groups:   those  which 
perform  a  deep  sentence  analysis  using  detailed  semantic  informa- 
tion about  a  very  limited  subject  area  and  those  that  perform  a 
shallow  analysis  utilizing  a  minimal  amount  of  semantic  information 
over  a  wide  range  of  vocabulary.   It  went  on  to  say  that  extending 
the  deep  systems  to  broader  subject  areas  will  require  the  assembly 
of  much  larger  bodies  of  semantic  information. 

In  his  introduction  this  morning,  Ralph  Grishman  gave  the 
opinion  that  in  the  extension  of  these  deep  systems  to  somewhat 
wider  systems  the  syntactic  part  is  pretty  well  in  hand  and  its 
extension  will  be  possible  with  a  reasonable  amount  of  extra 
effort,  but  in  the  area  of  semantics  something  more  like  an  order 
of  magnitude  of  effort  is  required  and  maybe  we  don't  know  how 
to  go  about  it  anyway  because  things  are  so  ill-structured  for 
semantic  processing. 

I  have  been  asked  to  discuss  possible  techniques  learned  from 
research  on  current  systems  for  gathering  and  organizing  such 
semantic  data  so  I  will  be  addressing  myself  to  that. 

During  the  past  few  years  at  the  IBM  Research  Center  I  have 
in  fact  been  associated  with  a  group  that  has  been  making  an 


-60- 


an  effort  to  implement  a  question  answering  system  of  the  deep 
type  that  was  cited  in  the  abstract.   I  would  classify  our  question 
answering  system,  which  we  call  the  'REQUEST  System',  as 
belonging  to  this  deep  type  for  several  reasons.  First  of  all, 
the  syntactic  structures  which  are  assigned  to  sentences  by  a 
transformational  grammar  in  our  system  reflect  a  relatively  deep 
representation  of  entities  and  the  relationships  between  them. 
I  would  characterize  our  structures  as  relatively  deep  because 
although  they  are  deeper  than  what  most  people  in  computational 
linguistics  call  deep  structures  they  are  still  not  as  deep  as, 
for  example,  Roger  Schank's  in  which  he  limits  himself  to 
approximately  eleven  primitives  and  insists  that  all  synonymous 
sentences  come  from  the  same  underlying  structures. 

The  second  reason  I  would  include  the  REQUEST  work  among  the 
deep  rather  than  the  shallow  approaches  is  because  it  makes  use 
of  detailed  semantic  information  about  a  very  limited  subject 
area. 

We  began  our  efforts  by  focusing  on  the  answering  of  questions 
relative  to  a  tabular  data  base;  in  particular,  that  which  is 
contained  in  the  Fortune  500  summary  of  annual  business  statistics. 
This  is  illustrated  in  Figure  1.   Our  current  data  base  contains 

Figure  1 
THE  BUSINESS  STATISTICS 
WORLD 
Headquarters 

Detroit 
Dearborn 
New  York 
New  York 
Armonk,  N.  Y. 


Year   Company 


1971 


Sales      Earnings 


G   M 
FORD 

STD   OIL     (NJ) 
G   E 
IBM 
1970         G   M 
FORD 
STD    OIL     (NJ) 

1969         G   M 
FORD 


-61- 


eight  fields  of  information  (such  as  headquarters  location,  sales 
and  earnings)  for  the  top  50  companies  in  the  Fortune  500  over 
the  time  interval  1967  -  1972. 

Now  although  our  goal  was  a  system  that  was  general  enough, 
and  modular  enough,  and  logically  coherent  enough   to  make 
possible  its  extension  to  a  wide  variety  of  other  applications, 
it  is  nevertheless  true  that  certain  of  our  initial  efforts  in 
syntax  as  well  as  semantics  were  directed  specifically  toward 
meeting  the  requirements  of  our  Fortune  500  application.   We 
have  one  transformation,  for  example,  that  resupplies  information 
that  a  headquarters  is  involved.   In  our  system,  when  you  ask 
"Where  is  IBM  located?"   this  means  you  want  to  know  where  the 
headquarters  of  IBM  is  located,  and  we  are  actually  using  a 
transformation  to  introduce  that  information.   This  transforma- 
tion is  obviously  not  valid  in  a  larger  application.   But  except 
for  a  relatively  small  number  of  such  cases,  most  of  our  trans- 
formations are  more  general,  and  in  fact  very  general,  we  would 
claim. 

Just  the  same,  we  did  insist  from  the  beginning  upon  a  modular 
system  that  was  based  soundly  on  linguistic  theory.   To  this  end, 
we  made  use  of  a  general  transformational  syntactic  analysis 
component  which  is  valid  for  any  of  a  number  of  transformational 
grammars  belonging  to  an  allowable  class  of  grammars.   In 
addition,  we  implemented  a  general  facility  for  semantic  inter- 
pretation based  on  a  model  due  to  Knuth  which  accepts  particular 
application-specific  translation  equations  and  their  associated 
predicates  and  functions.   Now  we  originally  hoped  to  successfully 
complete  work  on  the  pilot  Fortune  500  project  and  then  to  go  on 
to  other  applications;  and  we  have  in  fact  achieved  a  certain 
level  of  success  which  is  perhaps  best  illustrated  by  giving 
you  some  sample  dialogs . 

The  first  of  these  is  illustrated  in  Figure  2.   In  the  first 
example  the  user  types  his  question,  "Did  Chrysler  make  a  profit 
in  1969?",  after  being  prompted  by  the  computer,  and  the  system 
repeats  the  question  to  confirm  it  was  entered  correctly.   After 


-62- 


Figure  2 
REQUEST  DIALOG 

QUESTION? 

Did  Chrysler  make  a  profit  in  1969? 

DID  CHRYSLER  MAKE  A  PROFIT  IN  1969? 

6154  ANSWERS: 

1:   YES 

NEXT  QUESTION? 

What  company  ranked  fifth  in  1971  sales? 

WHAT  COMPANY  RANKED  FIFTH  IN  19  71  SALES? 

21325  ANSWERS: 

1:   IBM 

about  six  seconds  (6154  milliseconds)  the  answer  (YES)  was 
printed.   This  time,  as  well  as  that  of  the  other  sentences 
I  will  discuss,  has  been  cut  in  half  since  the  computer  runs 
in  question   were  made.   This  was  not  due  to  our  cleverness 
but  because  we  replaced  our  computer  by  one  which  is  twice  as 
fast.   It  has  been  comforting  to  note  that  as  our  grammar  has 
been  growing  in  size  and  complexity  and  in  the  range  of 
semantic  phenomena  that  we  have  been  able  to  take  care  of, 
sentence  parsing  time  has  been  continually  going  down  and 
exactly  for  this  reason;  twice  we  have  replaced  our  computer 
by  a  faster  model.   So  technology  is  at  least  keeping  up  with 
our  ability  to  complicate  things. 

The  next  question  in  Figure  2  is,  "What  company  ranked  fifth 
in  19  71  sales?"  and  in  somewhat  more  time  it  provides  the 
answer,  IBM. 

We  have  a  very  limited  facility  for  being  able  to  interact 
with  the  user  either  at  the  level  of  the  initial  input,  or  at 
the  level  of  interpretation  of  the  answer,  or  in  supplying  some 
miscellaneous  infonnation  that  the  person  might  be  interested  in. 
In  the  sentence  of  Figure  3  the  input  was  typed  incorrectly. 


-63- 


Figure  3 

NEXT  QUESTION? 

How  large  anuiuber  of  people  did  companies  in  Chicago  employ  in  1969? 

"ANUMBER"  IS  NOT  IN  LEXICON.   RETYPE  THE  WORD  OR  THE  QUESTION. 

a  number 

22115      ANSWERS: 


1969 


EMPLOYEES 

ARMOUR  31,70  0 

STANDARD   OIL    IND.  4  8,190 

INTERI^IATIONAL   HARVESTER  10  4,160 

SWIFT  32,700 

BEATRICE    FOODS  30,000 


The   words    "a"    and   "number"    were    run    together,    and   the    resulting 
word  was    not   in    the    lexicon.       In    such    a   case   you  want    to   retype 
the    incorrect  portion    and   it   allows   you   to   do   that    instead   of 
having   to   retype    the   whole    thing.      Then    the    system  provides    the 
answers.       In    this    case    the    logical    answer   is    a    list   of   numbers 
of   people   employed,    but    there    is    the    implication    that  you   are 
interested   in  which    companies   employed   those   people.       In    such 
cases    it   is   necessary    to   supply    the    implied   data,    and   this   data 
is    often    conveniently   supplied   in   the    form  of   a    table    such    as 
the    one    of   Figure    3   that    shows   what    the    companies    in   Chicago 
were    and  how  many   people   each   employed   in    1969. 

One    final   example   will   show  you   the    limit    of  what   we    can 
handle    (which   begins    to   approach   the    limit   or   maybe   exceeds    the 
limit   of  what   you   can    also   understand) .      The    sentence    in   question 
is,     'What   were    the    1968-70   earnings   of   companies    not    located   in 
New   York   City  whose    1971    assets   were    larger   than   General   Electric' s 
19  72    assets?"      The    last    time    this    sentence   was   processed   it   took 
just    under    a   minute    of   CPU    time.       The    resulting    tabular    answer 
is    shown    in    Figure    4.      But   complicating    this   particular   sentence 


-64- 


Figure  4 

Answers  to  the  question:   "WliAT  WERE  THE  196  8-1970  EARNINGS  OF 
COMPANIES  NOT  LOCATED  IN  NEW  YORK  CITY  WHOSE  1971  ASSETS  WERE 
LARGER  THAN  GENERAL  ELECTRIC 'S  1972  ASSETS?" 

ANSWERS: 

NET_INCOME 

$ 

1968 

GENERAL  MOTORS  1,731,915,000 

FORD  626,600,000 

IBM  871,498,000 

GULF  OIL  626,319,000 

STANDARD  OIL  OF  CALIF.  451,831,000 

1969 

GENERAL   MOTORS  1,710,695,000 

FORD  546,500,000 

IBM  933,873,000 

GULF   OIL  610,558,000 

STANDARD   OIL   OF   CALIF.  453,786,000 

19  70 

GENERAL   MOTORS  609,087,000 

FORD  515,700,000 

IBM  1,017,521,000 

GULF    OIL  550,366,000 

STANDARD    OIL    OF  CALIF.                                                  454,817,000 


by    replacing    "General   Electric"    by    a   noun   phrase   with   another 
relative    clause    that   happens    to   evaluate    to    "General   Electric" 
(i.e.,    "the    company    that...)    pushed   us    over    the    top.       I    haven't 
attempted  to  reprocess  this    sentence   with    our    faster  machine, 
but   it   is    still    likely   to  be   very    time-consuming. 


-65- 


The  difficulty  of  understanding  sentences  such  as  the  previous 
one  is  sufficiently  high  as  to  make  the  occurrence  of  such  complex 
sentences  unlikely.   If  they  do  occur  they  would  require  signifi- 
cant processing  time  but  not  so  much  as  to  preclude  the  research 
purpose  usage  for  which  the  system  was  intended.   Usage  on  a 
production  basis  would  require  a  new  implementation  anyway  for 
reasons  unrelated  to  speed  requirements. 

I  have  complained  for  several  years  that  reports  on  existing 
question  answering  systems  often  don't  provide  the  reader  with 
enough  information  to  estimate  the  coverage  of  questions  which 
is  provided.   In  fact,  many  such  reports  give  the  impression  that 
the  coverage  is  much  larger  than  it  really  is.   We  hope  to  avoid 
this  criticism  of  the  REQUEST  System.   That  is  one  reason  why  we 
haven't  published  more  about  it.   Although  we  believe  that  our 
coverage  is  as  large  as  that  of  any  existing  system  and  larger 
than  most,  we  acknowledge  that  it  is  nowhere  near  as  large  as 
we  wish  it  were.   Nevertheless,  we  feel  we  have  made  slow  and 
steady  progress;  it  has  been  rather  hard-won  progress  and  there 
are  many  gaps  that  still  remain  to  be  filled.   Every  time  we  add 
a  new  syntactic  construction  we  are  careful  to  make  sure  that  it 
works  completely  with  all  the  others  that  we  have  up  to  that  time. 
Insuring  that  a  construction  interacts  correctly  with  all  of  the 
previously  considered  syntactic  phenomena  is  a  task  which  requires 
careful,  painstaking  effort.   It  is  difficult  enough  when  pursued 
from  a  transformational  approach  but  even  more  difficult  when 
pursued  through  other  models,  which  are  more  vulnerable  to  the 
problem  of  unwanted  and  erroneous  interaction  between  components 
designed  to  process  isolated  instances  of  distinct  syntactic 
phenomena  which  can  co-occur. 

One  can  of  course  fake  it;  you  can  handle  a  few  sentences  of 
each  kind  quite  easily.   For  example,  if  you  were  satisfied 
just  to  handle  certain  comparatives  but  not  others,  you  could 
do  that  in  pretty  much  of  a  hurry,  but  then  you  would  run  into 
a  problem;  your  users  would  assume  they  could  use  comparatives 
in  all  natural  ways  and  would  quickly  start  using  them  in  ways 


-66- 


the  system  couldn't  handle.   The  user  would  then  be  in  a  quandary 
as  to  what  he  could  or  couldn't  say.   We  want  to  avoid  this 
problem  subject  only  to  the  limitations  of  the  particular  appli- 
cation we  are  dealing  with  at  the  time,  allowing  the  user  to 
express  himself  in  a  natural  and  flexible  way. 

As  I  said,  we  wanted  to  be  able  to  handle  the  Fortune  500 
system  and  then  go  on  to  other  things  after  that,  but  we  found 
that  adding  new  syntactic  constructions  is  a  lot  slower  and 
harder  than  we  had  anticipated.   And  so  while  we  are  still 
pursuing  this  we  thought  it  was  not  too  soon  to  investigate  the 
problem  of  how  hard  or  how  easy  it  would  be  to  consider  a  new 
application  and  to  find  out  what  kinds  of  problems  would  we  be 
faced  with  if  we  did  so. 

This  consideration  of  how  to  extend  our  system  to  other  areas 
and  how  large  a  task  that  would  be  is  germane  to  this  symposium; 
but  in  order  to  say  very  much  more  about  it  than  surface  generali- 
ties I  think  it  is  necessary  to  tell  you  at  least  a  little  bit 
more  about  our  REQUEST  System. 

Some  of  the  key  words  and  salient  characteristics  of  that 
system  are  given  in  Figure  5.   It  is  based  on  a  transformational 

Figure  5 
REQUEST 

—  An  experimental  Restricted  English  QUESTion- answering 
system  based  on: 

•  a  transformational  grammar 

•  a  transformational  parser 

•  a  Knuth-style  semantic  interpreter 

•  a  set  of  processing  functions 

—  Implemented  in  LISP  under  VM/16  8 

—  Currently  capable  of  answering  a  variety  of  English 
questions  relating  to  a  small  Fortune  500-type  data  base. 


-67- 


grammar  in  much  the  way  that  a  linguist  would  write  it.   It 
makes  use  of  a  transformational  parser  that  is  implemented 
once  and  for  all  and  is  applicable  to  any  of  the  grammars  that 
belong  to  the  class  of  grammars  that  is  allowable.  The  semantic 
interpreter,  likewise,  is  a  very  general  device,  and  it  can  be 
specialized  for  the  particular  deep  structures  that  our  grammar 
assigns.   And  then  there  are  a  set  of  processing   functions. 
REQUEST  is  implemented  in  LISP  under  VM  16  8  at  the  moment;  and 
it  answers  questions  on  the  Fortune  500.   To  tell  you  just  a 
little  bit  more  about  it  without  overwhelming  you,  consider 
Figure  6.   The  main  idea  here  is  that  there  are  two  components, 
a  syntactic  (transformational)  component  which  is  the  top 
portion  of  the  figure  and  a  semantic  component  that  is  shown 
below.   The  square  boxes  represent  parts  that  are  done  once  and 
for  all  in  the  system  and  which  don't  have  to  be  redone  for 
another  application.   So,  for  example,  the  transformational 
parser  is  ready  and  waiting  for  a  new  application  to  another 
language  or  for  perhaps  just  another  application  that  makes  a 
more  demanding  use  of  certain  syntactic  constructions  which 
weren't  found  necessary  for  the  previous  one.   The  semantic 
interpreter  likewise  has  got  a  permanent  part  that  is  imple- 
mented once  and  for  all.   But  the  circles   represent  data  that 
are  tailored  to  a  specific  application.   The  top  four  such 
circles         constitute  the  specification  of  a  transforma- 
tional grammar.   Although  we  produced  a  grammar  specifically 
devised  for  the  Fortune  500  application,  almost  all  of  that 
work  is  applicable  to  other  applications  as  well. 

We  also  specialized  Knuth-type  semantic  rules  to  the 
particular  deep  structures  which  our  transformational  grammar 
assigns  to  sentences.   Since  the  basic  underlying  structures 
assigned  to  sentences  can  be  expected  to  vary  very  little  from 
application  to  application,  so  too  can  the  corresponding 
semantic  rules  which  interpret  that  structure  be  expected  to 
remain  the  same  for  different  applications. 


-68- 


Figure  6 
Overall  System  Organization 


,-G 


SER 


r 


TRANSFOR- 
MATIONAL 
COMPONENT 


INTERPRETIVE 
COMPONENT 


3 


Input  Word 
String 


PREPROCESSOR 


Preprocessed 
String 


PARSER 


Underlying 
Structurc(a ) 


SEMANTIC 
INTERPRETER 


Executable 


\ 


\ 


(Log 


Code  \   ^ ^^^ 

ical  Form)  /  \^ 


RETRIEVAL 


L 


I 

I 

I 

—  •J      Output 

-69- 


/       DATA       \ 
-^-4       BASE         J 


Finally,  tJie  data  base  itself  will  vary  from  one  application 
to  another.   The  abstract  form  of  the  data  as  relational  n-tuples 
remains  the  same,  however.   Furthermore,  the  retrieval  component 
in  the  bottom  rectangular  box  remains  fixed.   It  is  concerned 
with  performing  actual  look  up  of  data  and  with  evaluating  the 
quantified  propositions  and  set  representations  referred  to  in 
Figure  6  as  "logical  forms'.   A  final  point  to  discuss  about 
Figure  6  is  the  various  structural   representations  of  a  sentence 
which  are  produced  during  processing.   In  particular,  I  would 
like  to  give  some  examples  of  underlying  structures  which  result 
from  syntactic  analysis  and  also  to  give  some  examples  of  the 
so-called  logical  forms  into  which  underlying  structures  are 
mapped  by  the  process  of  semantic  interpretation. 

Let's  take  a  simple  sentence  first.  In  Figure  7  we  have 
the  underlying  structure  which  is  currently  assigned  to  the 
sentence.   Was  IBM  profitable  in  1971?.   Feature  information 

Figure  7 
Underlying  Structure  of:   WAS  IBM  PROFITABLE  IN  1971? 


PROFITABLE 


NOUN 


INDEX 


NOUI^I 


INDEX 


BD 


IBM 


1971 


associated  with  the  nonterminal  nodes  has  been  removed  in  this 

figure.   As  you  can  imagine,  it  isn't  hard  to  turn  this  structure 

into  the  form  (PROFITABLE  'IBM  '19  71)  which  represents  the 

proposition  formed  from  the  predicate  PROFITABLE  and  the  (constant) 


•70- 


arguments  IBM  and  1971.   This  is  precisely  the  logical  form 
into  which  our  semantic  component  maps  the  structure  of  Figure  7. 
The  atoms  IBM  and  19  71  are  quoted  because  the  LISP  programming 
system  interpreter  is  used  to  evaluate  logical  forms.   The 
predicates  in  question  must  first  be  defined,  of  course,  with 
respect  to  the  given  data  base. 

As  sentences  get  more  complicated  so,  too,  do  their  corres- 
ponding underlying  forms.   Consider,  for  example,  the  sentence. 
What  companies'  1971  earnings  exceeded  $1,000,000,000?,  whose 
underlying  structure  is  shown  in  Figure  8.   (This  structure  was 
assigned  by  our  grammar  of  about  two  years  ago.   Our  current 
grammar  assigns  a  slightly  different  structure,  but  the  differ- 
ences are  not  relevant  to  the  discussion  which  follows.)  This 
structure  is  typical  of  the  tree  structures  which  we  must  map 
into  computer-interpretable  logical  forms. 

Figure  9  is  a  simplified  version  of  Figure  8.   Feature 
information  has  been  suppressed  along  with  the  structure  assigned 
to   "$1,000,000,000".   An  informal  interpretation  of  this 
structure  is  as  follows.   The  GREATER-THAN  relation  is  seen  to 
hold  between  two  noun  phrases.   The  first  of  these  represents 
the  X5  such  that  X5  is  the  amount  of  money  earned  by  some  company 
X2  in  1971.   This  company  X2  is  WH-marked  to  reflect  that  its 
identity  is  to  be  determined.   The  second  noun  phrase  simply 
represents  "$1,000,000,000",  and  for  our  present  purposes  the 
structure  assigned  to  it  is  superfluous. 

The  semantic  component  must  map  the  underlying  structure 
of  Figure  9  into  a  logical  form  which  represents  the  meaning 
of  the  sentence  as  indicated  above.   To  do  this  it  basically 
has  only  to  remove  unwanted  nodes  and,  in  some  cases,  to  make 
some  minor  changes  to  the  tree  structure.   The  logical  form 
which  was  produced  by  the  REQUEST  System  about  two  years  ago 
is  the  old  logical  form  of  Figure  10.   It  represents  the  set 
of  elements  X2  such  that  X2  is  a  company  and  the  elements  of 
still  another  set  are  greater  than  1,000,000,000.   That  other 
set  is  the  set  of  elements  X5  such  that  X5  is  the  earnings  of 


-71- 


Figure  8 

Underlying  Structure  of 

WHAT  COMPANIES'  1971  EARNINGS  EXCEEDED  $1,000,000,000 


BD   V 


1,000,000,000 


INDEX[-  CONST]    V   INDEX[-  CONST] 

I  I      \ 

X16  DOLLAR   X8 


-72- 


Figure    9 
Underlying   Structure    of: 
WHAT    COMPANIES'     1971    EARNINGS    EXCEEDED    $1,000,000,000? 


CREATE RTH AN       THE      NOM 


NP  BD 


INDEX      AMOUNT  \    EARl-J      NOM 
MONEY 


NOUN         V 


NOM  NOUN 


INDEX      Vm    SOME      NOUInI  INDEX 


X5 


X5 


V      INDEX      1971 


COM-PANY      X2 


-73- 


Figure    10 
MiAT    COMPANIES'     1971    EARNINGS    EXCEEDED    $1,000,000,000? 

Old   Logical    Form; 

(setx  ' x2 

'  (and 

(greaterthan 
(setx         ' x5 

■  (earnl   x5   x2   '  1971)  ) 
■1000000000) 
(company  x2)  )  ) 


New  Logical  Form: 

(setx      ' x2 
•  (and 

(forall   'x31 
(setx   'x2  4 
'  (and 

(testfct   'NET_INCOME  X2   X2 4   '1971  NIL) 
(money   X2  4)  )  ) 
'(greaterthan   X31   '1000000000)   ) 

(company   X2)  )  ) 


-74- 


X2  in  1971.   This  logical  form  was  a  reasonable  starting  point, 
but  it  was  lacking  in  at  least  one  respect.   In  our  example 
the  inner  set  has  only  a  single  member  but  this  need  not  be 
the  case  as,  for  example,  in   Were  the  earnings  of  all  the 
companies  located  in  Chicago  greater  than  $1,000,000,0007. 
When  such  a  set  with  more  than  one  member  is  an  argument  of  a 
predicate  such  as  GREATERTHAN  we  must  suitably  define  the  truth 
values  which  result.   Although  predicates  such  as  GREATERTHAN 
could  be  defined  to  permit  sets  as  arguments   this  amounts  to 
handling  quantification  by  incorporating  it  in  the  definitions 
of  different  predicates  and  then  selecting  the  right  predicates 
to  reflect  the  correct  quantification  of  a  given  sentence.   In 
such  a  treatment  of  sentences  not  only  is  the  treatment  of 
quantification  opaque  but  there  is  a  lack  of  uniformity  in  the 
way  that  explicitly  quantified  and  implicitly  quantified 
sentences  are  treated. 

For  these  reasons  we  decided  to  explicitly  represent 
quantifiers  in  our  logical  forms.   This  resulted  in  the  new 
logical  form  of  Figure  10  being  assigned  to  our  illustrative 
sentence.   This  logical  form  represents  the  set  of  elements  X2 
such  that  X2  is  a  company  and  for  all  X31's  which  belong  to  a 
certain  set,  X31  >  1,000,000,000.   That  certain  set  is  the  set 
of  elements  X24  such  that  x24  is  the  net  income  of  (company)  X2 
in  (the  year)  1971,  and  X24  is  an  amount  of  money.   (In  addition 
to  the  difference  in  treating  quantification,  the  old  and  new 
logical  forms  differ  in  other  respects  such  as  the  use  of  a 
generalized  relational  data  base     predicate  TESTFCT  to  replace 
a  host  of  individual  predicates.) 

One  final  example  of  a  logical  form  is  shown  in  Figure  11. 
The  sentence  in  question  is.  How  large  a  number  of  people  did 
the  companies  in  Chicago  employ  in  1969?,  and  the  logical  form 
contains  the  set  of  elements  X6  such  that  for  at  least  one 
element  X46  of  a  certain  set,  X6  is  the  number  of  people 
employed  by  (company)  X46  in  1969.   That  certain  set  is  specified 
to  be  the  set  of  elements  X8  such  that  X8  is  a  company  and  all 


-75- 


Figure  11 

Logical  form  for: 

HOW  LARGE  A  NUMBER  OF  PEOPLE  DID  THE  COMPANIES  IN  CHICAGO 

EMPLOY  IN  1969? 


(SIZEOF 

(SETx  'xe 

' (FORATLEAST   1   'X46 
(SETX   'X8 
•  (Al-JD 

(FORALL   'X45 
(SETX  'X36 

' (TESTFCT       'HEADQUARTERS      X8      X36       '1973      NIL)) 
'(LOCATED      X45       'CHICAGO)) 
(COMPAIvIY       X8)  )  ) 
'(TESTFCT       'EMPLOYEES      X46       X6       '1969      NIL)))) 


elements    of  yet    another   set    (which   depends    upon   X8)    are    located 
in   Chicago.      Finally,    that    last   set    is    the   set   of   elements    X36 
such    that   X36    is    the    current    location   of   the   headquarters    of 
(coirpany)     X8. 

This  example  is  complicated  enough  to  show  that  logical  forms 
are  in  principle  easy  for  people  to  understand  but  in  practice 
quickly  become  difficult  to  understand  as  sentence  complexity 
increases.   They  are  suitable  computer  representations  of 
sentence  meaning,  however,  and  can  in  fact  be  evaluated  by  the 
LISP  System  interpreter  if  all  of  the  predicates  (such  as 
TESTFCT,  LOCATED,  FORALL,  FORATLEAST,  SETX,  etc.)  are  defined. 

The  representation  of  knowledge  in  the  REQUEST  System  is 
seen  to  be  inherent  in  the  definition  of  these  functions.  Certain 
predicates,  to  be  sure,  such  as  AND,  OR,  NOT,  FORALL,  FORATLEAST, 
and  SETX  are  very  general  logical  predicates  whica  v/ould  be 


-76- 


found  in  any  application.   The  predicate  TESTFCT  is  also  general; 
its  arguments  designate  a  file  of  a  relational  data  base  and 
a  sequence  of  elements,  and  it  returns  a  value  of  TRUE  just  in 
case  that  sequence  is  one  of  the  rows  of  that  file.   Thus 
although  it  is  very  general,  its  use  will  vary  from  application 
to  application. 

The  remaining  predicates,  such  as  LOCATED,  are  those  for 
which  a  procedural  definition  is  preferred  over  a  relational, 
tabular  data  base  definition.   We  would  expect  a  number  of 
specialized  predicates  of  this  type  to  arise  in  each  new 
application,  but  the  total  use  of  these  predicates  would  be 
relatively  light  in  an  application  chosen  to  answer  questions 
relative  to  a  large  mass  of  tabular  data. 

The  predicates  which  appear  in  logical  forms  largely  reflect 
those  which  appear  in  the  corresponding  underlying  structures. 
They  can,  of  course,  be  tailored  somewhat  to  reflect  existing 
file  structure,  as  we  have  done  through  our  use  of  TESTFCT. 
The  serious  developer  of  a  question  answering  system  will 
sooner  or  later  discover,  however,  that  there  are  gaps  between 
his  logical  representations  of  meaning  and  his  file  structures 
(i.e.,  the  predicates  he  provides  for  dealing  with  tabular  data). 
There  are  several  ways  of  coping  with  this  problem.   The  first 
is  simply  to  implement  every  predicate  which  is  linguistically 
motivated  and  which  therefore  occurs  in  the  logical  representa- 
tion of  some  sentence  of  importance  in  a  particular  application 
being  studied.   This  is  the  easy  way  out  conceptually,  but  it 
has  unfortunate  practical  and  theoretical   consequences.   What 
results  is  a  wild  proliferation  of  predicates  which  are 
unrelated  in  the  model  even  though  they  are  very  much  related 
in  actuality.   Sometimes  the  relationship  between  such  predi- 
cates can  be  accounted  for  simply  by  defining  some  in  terms 
of  others.   This  can  take  the  form  of  a  direct  definition  or, 
for  relational  data  files,  it  can  take  the  form  of  replacing 
the  accessing  of  a  single  file  which  does  not  happen  to  exist 
by  the  accessing  of  a  sequence  of  files  through  the  use  of 
appropriate  file  keys. 


-77- 


An  alternative  to  explicitly  defining  some  predicates  in 
terms  of  others  is  to  do  this  implicitly  by  means  of  theorem 
proving  techniques.   The  total  amount  of  stored  information 
is  thereby  reduced  at  the  cost  of  increased  time  and  processing 
complexity.   This  appears  to  be  particularly  appropriate  where 
there  exist  secondary  files  that  are  derived  from  primary  files 
and  which  contain  such  information  as  aggregates  and  averages. 
In  such  cases  logical  forms  may  be  produced  which  are  logically 
impeccable  but  whose  evaluation  with  respect  to  a  given  data 
base  is  enormously  less  efficient  than  that  of  logically 
equivalent  forms  which  access  preprocessed ,  derived  data. 
Sometimes  the  use  of  equivalent  logical  forms  is  called  for 
because  the  lower  level  information  is  not  available  and  the 
higher  level  information  is  no  longer  derived  but  rather  primary. 
Such  a  case  in  our  current  Fortune  500  application  arises  from 
the  fact  that  the  total  number  of  persons  employed  by  individual 
companies  in  specified  years  is  contained  in  our  files,  but 
those  files  do  not  contain  information  about  individual  employees 
even  though  our  logical  forms  do  reflect  the  existence  of 
individual  employees.   We  are  currently  considering  the  use  of 
more  general  theorem  proving  techniques  to  replace  our  present 
ad  hoc  treatment  of  such  cases. 

A  final  solution  to  the  problem  of  coping  with  the  prolifera- 
tion of  predicates  which  can  appear  in  semantic  representations 
is  to  limit  their  number  drastically,  by  fiat.   This  approach, 
which  is  typified  by  Roger  Schank's  limitation  to  eleven  or 
twelve  primitive  "actions'",  is  superficially  attractive  but 
does  not  stand  up  well  under  careful  scrutiny.   The  difficulty 
of  adequately  representing  the  meaning  of  natural  language 
sentences  with  only  eleven  or  twelve  primitive  relations  has 
been  discussed  by  many  authors.   There  are  two  other  serious 
problems,  however,  which  assume  serious  importance  when  you 
replace  the  consideration  of  isolated  "toy  world"  examples  by 
the  consideration  of  how  to  make  possible  the  nearly  unrestricted 
use  of  natural  language  in  a  practical,  real  world  application. 


-78- 


The  first  problem  relates  to  the  difficulty  of  assigning 
adequate   semantic  representations  to  a  rich  subset  of  English. 
Ad  hoc,  unstructured  procedures  are  fine  for  very  limited 
coverage  of  a  natural  language,  but  they  are  not  suitable  for 
assigning  semantic  representations  to  the  sentences  of  a  natural 
language  which  involve  interrelated  complex  constructions.  It  is 
no  accident  that  no  attempt  has  been  made  to  provide  much 
coverage  of  a  natural  language  for  any  linguistic  theory  based 
on  a  very  small  number  of  primitive  relations. 

The  second  problem  concerns  the  difficulty  in  using  semantic 
representations  couched  in  just  a  few  primitive  relations.  If 
the  tabular  data  of  practical  question  answering  applications 
are  viewed  as  a  relational  data  base,  the  relations  involved 
are  seen  to  be  rather  high  level  ones,  not  the  ultra  primitive 
relations  which  could  conveniently  interface  with  semantic 
structures  of  the   type  proposed  by  Schank.   Having  experienced 
a  great  amount  of  difficulty  when  the  degree  of  mismatch 
between  semantic  representations  and  the  natural  and  convenient 
representation  of  knowledge  was  much  less  than  that  which  would 
be  encountered  in  attempting  to  apply  Schank 's  "conceptual  struc- 
tures" to  any  real  world  natural  language  application,  it  appears 
likely  that  conceptual  structures  are  rather  ill  suited  for  any 
realistic,  useful  purpose.  To  effectively  dispute  this  conclusion 
it  would  be  necessary  to  show  that  conceptual  structures  can  be 
interfaced  with  a  reasonable  representation  of  real  world 
knowledge  by  means  of  an  appropriate  inferential  component. 
Although  Schank  has  apparently  realized  the  necessity  of  showing 
this,  his  attempts  to  do  so  have  been  unsuccessful.   Instead  of 
starting  with  a  significant  body  of  knowledge  and  attempting  to 
demonstrate  that  it  is  possible  to  relate  the  semantic  structures 
of  a  sizable   subset  of  English  to  that  data,  he  has  provided 
semantic  structures  for  a  very  small  subset  of  English  and  then 
has  tailored  some  rules  of  inference  and  some  specific  bits  of 
knowledge  to  those  structures  which  suffice  to  generate  certain 
inferences  from  the  sentences  which  can  be  handled.   He  has  not. 


-79- 


however,  shown  that  it  is  possible  to  relate  his  conceptual 
structures  to  a  large,  nontrivial  representation  of  some  body 
of  knowledge. 

The  remaining  topic  I  would  like  to  discuss  is  the  problem 
of  extending  the  approach  we  have  followed  in  REQUEST  to  other 
applications.   We  started  looking  at  some  other  applications 
even  before  getting  very  deeply  into  the  Fortune  500  application. 
One  thing  which  was  lacking  in  our  pilot  project  was  an 
identifiable  natural  class  of  Fortune  500  users.   It  was  recog- 
nized that  without  such  people  attempting  to  interact  with  the 
Fortune  500  in  their  usual  manner,  evaluation  of  the  system 
would  be  difficult;  we  would  somehow  have  to  pose  question- 
answering  tasks  to  a  group  of  subjects  either  by  some  nonlinguistic 
means  or  at  least  by  some  means  which  is  far  removed  from  the 
allowable  means  of  natural  language  communication  with  the 
FIEQUEST  System.   Rather  than  face  this  problem  we  considered 
applying  the  REQUEST  System  to  an  application  of  importance  to 
a  wide  range  of  users  in  order  to  be  able  to  evaluate  the  extent 
to  which  the  system  is  able  to  satisfy  their  need  to  query  a 
sizable   data  base.   Another  reason  for  considering  a  new 
application  was  to  be  able  to  evaluate  the  extensibility  of  the 
system  and  to  insure  that  its  development  proceeded  in  a  way 
which  facilitated  its  application  to  a  range  of  different 
applications. 

We  considered  a  number  of  possible  new  applications  and 
found  most  of  them  lacking  for  such  reasons  as  sensitivity  of 
the  data  base,  nonexistence   of  the  data  base  in  machine  readable 
form,  inappropriateness  of  data  base  size  (either  too  small  or 
too  large  for  our  purposes) ,  or  nonavailability  of  interested 
users.   As  a  result  of  our  consideration  of  new  applications 
we  identified  a  number  of  potential  candidates  for  a  practical 
natural  language  question  answering  system  and,  in  particular, 
we  singled  out  one  which  was  attractive  as  a  second  pilot 
application.   This  involved  the  interrogation  of  city  land  use 
records  by  such  officials  as  assessors,  city   planners,  and 


-80- 


management  personnel.   We  have  done  a  fair  amount  of  informant 
work,  collecting  input  which  potential  users  would  like  to  be 
able  to  submit  to  a  question  answering  system,  and  we  have 
evaluated  the  feasibility  of  extending  REQUEST  so  as  to  make 
this  possible.   Although  we  are  not  yet  committed  to  such  an 
undertaking  we  hope  very  soon  to  formalize  an  agreement  with  a 
nearby  town  to  customize,  install,  and  evaluate  a  REQUEST-based 
land  use  question  answering  system. 

The  first  question  we  considered  in  our  land  use  feasibility 
study  was  the  additional  syntactic  coverage  required.  In  this 
regard  I  at  least  partially  agree  with  what  Ralph  Grishman  said 
in  his  opening  remarks.   But  even  though  we  believe  it  possible 
to  provide  adequate  syntactic  coverage  for  our  new  application, 
we  estimate  that  most  of  our  effort  will  be  directed  toward 
achieving  this.   This  does  not  stem  from  large  syntactic 
differences  between  the  Fortune  500  and  the  land  use  applications; 
rather,  most  of  the  required  constructions  which  are  not 
allowed  in  our  current    REQUEST  grammar  are  constructions  which 
we   already  knew  were  necessary  and  which  were  already  ranked  in 
a  tentative  priority  of  adding  them  to  our  grammar.   Very  few 
constructions  were  identified   the  need  for  which  had  not  already 
been  recognized,  and  this  seems  to  indicate  that  once  an  adequate 
level  of  syntactic  coverage  is  achieved  for  one  application  the 
incremental  cost  of  extension  to  another  application  should  be 
relatively  modest. 

My  statement  that  most  of  our  effort  is  expected  to  go  into 
extending  syntactic  coverage  contrasts  with  the  estimates  of 
other  workers  whose  position  on  the  relative  difficulty  of 
extending  system  capabilities  in  syntax  and  semantics  is  more 
like  that  outlined  by  Ralph  Grishman.   In  large  measure  this 
difference  stems  from  the  relatively  large  load  which  we  place 
on  our  syntactic  component.   It  not  only  incorporates  feature- 
based  selectional   restrictions,  but  also  it  is  charged  with 
assigning  structure  which   represents  meaning  in  a  uniform  and 
straightforward  way,  which  enormously  simplifies  the  problem 
of  semantic  interpretation. 

-81- 


There  are,  of  course,  lexical  extensions  required  by  a  new 
application,  but  these  do  not  pose  a  serious  problem  apart  from 
the  new  constructions  in  which  they  appear.   In  considering  our 
new  application  we  were  quick  to  add  extra  lexical  items  so  as 
to  be  able  to  try  out  some  sentences  whose  syntactic  construc- 
tions were  already  provided  for.   In  this  way  reasonable  logical 
forms  were  assigned,  but  they  could  not  be  evaluated,  of  course, 
without  defining  new  predicates  which  interface  with  new  data 
base  information.   Thus  it  is  not  enough  to  provide  lexical 
information  about  a  new  word  such  as  commercial ,  but  it  is  also 
necessary  to  define  a  corresponding  predicate  (COMMERCIAL  unless 
explicitly  named  something  else)  of  one  argument  which  is  true 
if  its  argument  is  a  property  which  is  zoned  "commercial". 

This  definition  of  new  predicates  is  the  main  task  which  is 
required  in  the  semantic  extension  of  REQUEST  to  new  applications. 
The  phrase  structure  rules  which  make  up  our  underlying 
structures  stay  the  same  from  application  to  application,  so  their 
translation  to  logical  form  is  virtually  unchanged.   This  situa- 
tion contrasts  markedly  with  that  in  other  question  answering 
systems  whose  syntactic  structures  (which  may  either  be  explicit 
or  implicit)  are  much  shallower  than  ours  and  hence  are  inter- 
preted by  a  very  complex,  application-specific  semantic  component. 

The  definition  of  required  new  predicates  is  facilitated  by 
the  use  of  a  general  table  look-up  predicate.   Such  a  predicate 
returns  true  for  a  particular  sequence  of  arguments  if  a  file 
corresponding  to  the  predicate  contains  a  row  which  consists  of 
just  that  sequence  of  argument  values.   Because  much  application- 
specific  information  can  be  viewed  as  such  a  relational  data  base, 
a  general  predicate  of  this  type  provides  the  implementation  for 
a  large  number  of  the  predicates  which  arise  in  a  new  application. 
This,  together  with  the  fact  that  many  application-specific  data 
bases  already  exist  in  machine  readable  form,  greatly  simplifies 
the  semantic  extension  of  the  REQUEST  System  to  new  applications. 

There  are,  to  be  sure,  a  number  of  problems  which  assume 
serious  proportions  in  going  from  an  application  such  as  the 
Fortune  500  to  one  involving  a  larger  data  base.   Most  of  these 


-82- 


concern  the  question  of  retrieval  efficiency,  and  although 
they  are  in  some  sense  secondary,  the  development  of  a  practical 
system  requires  their  solution.   We  are  currently  expending 
considerable  effort  on  such  problems.   Another  problem  concerns 
the  choice  of  semantic  primitives  and  the  problem  of  interfacing 
them  with  the  semantic  predicates  which  naturally  arise  in  a 
given  relational  data  base.   We  had  no  such  problem  in  our  pilot 
Fortune  500  project  because  we  simply  took  the  semantic  predicates 
which  were  motivated  on  linguistic  grounds  and  implemented  each 
of  them  separately.   In  some  cases  this  required  a  redundant 
storage  of  data,  but  this  was  acceptable  for  the  size  of  the 
data  base  in  question.   We  have  observed,  however,  that  in  a 
larger  data  base  redundant  storage  is  not  acceptable,  and  an 
alternative  means  of  bridging  the   gap  between  linguistic 
semantic  predicates  and  data  base  semantic  predicates  is  required. 
This  seems  to  be  a  good  candidate  for  the  use  of  inference,  and 
we  are  presently  considering  this.   In  particular  we  are 
examining  specific  cases  where  a  given  relation  is  not  directly 
represented  in  our  given  data  base  but  it  is  implied  by  several 
other  relations  which  are  directly  related  to  the  data  base. 
Note  that  data  base  semantic  primitives  tend  to  be  rather  high 
level,  and  thus  the  choice  of  a  small  number  of  relatively  low 
level  linguistic  semantic  primitives  makes  this  problem  of 
bridging  the  gap  more  difficult  rather  than  less  difficult. 

In  closing  I  would  like  to  observe  that  one  valid  criticism 
of  many  current  question  answering  systems  is  that  their  syntax 
and  semantics  are  so  intimately  tied  to  each  other  and  to  a 
particular  microworld   application  that  extension  to  a  new 
application  is  hopeless;  it  involves  ripping  apart  the  whole 
system  and  fabricating  a  new  one  to  deal  anew  with  the  vagaries 
of  the  new  application.   On  the  contrary,  the  approach  we  have 
taken  with  REQUEST  involves  well  understood  syntactic  and 
semantic  information  in  a  clear  and  uniform  fashion.   This 
facilitates  extension  of  the  system  both  in  its  syntactic 
coverage  and  in  its  encompassing  of  a  wider  range  of  semantic 
phenomena  which  arise  from  a  consideration  of  new  applications. 


-83- 


We  are,  of  course,  focusing  on  a  particular  type  of 
information,  a  relational  data  base  usually  represented  by 
tabular  data,  and  thus  the  appropriateness  of  our  system  to 
information  represented  in  some  other  fashion  must  be  examined. 
We  believe,  however,  that  this  is  a  particularly  important 
type  of  data  and  that  our  system,  REQUEST,  provides  a  well 
suited  basis  for  the  natural  language  interrogation  and  manipula- 
tion of  data  of  this  type. 


-84- 


Questions 

Malcolm  Harrison  [New  York  University]: 

You  don't  happen  to  have  a  slide  with  an  example  of  a  sentence 
you  couldn't  handle? 
Pe  trick : 

I  have  a  handout  which  you  can  pick  up  i f  you  want  containing 
examples  of  sentences  we  can  handle  and  also  a  lot  of  sentences 
we  can't  handle    because  they  contain  constructions  not  yet 
provided  for.   Some  of  those  constructions,  such  as  conjunction 
and  pronomonalization,  we  do  process  in  certain  restricted  cases, 
but  we  list  them  among  the  constructions  not  presently  handled 
because,  unlike  the  constructions  we  claim  to  handle,  they  have 
not  been  thoroughly  provided  for  to  work  correctly  with  all  our 
transformations  in  producing  the  full  range  of   sentences  in 
which  they  occur. 
Harrison: 

Have  you  modified  your  extended  systems  so  as  to  be  able  to 
answer  questions  from  either  data  base? 
Patrick: 

Yes,  we  have.   As  long  as   you  are  willing  to  have  both  data  bases 
in  at  the  same  time  this  isn't  a  problem.   We  would  have  to 
make  few  if  any  changes  of  the  type  which  involved  removing 
something  from  our  transformational  grammar  when  combining  two 
applications.   This  is  because  we  built  on  an  existing  grammar 
for  one  application  in  extending  it  for  another  application. 
The  same  is  true  of  the  semantic  component,  except  that  you 
might  want  to  extend  the  meanings  of  some  words  which  had 
rather  specialized  usage  in  a  restricted  initial  application. 
A  case  in  point  is  our  treatment  of  the  verb  located  in  our 
Fortune  500  application.   Because  of  our  limited  data  base  we 
assume  that  the  sentence   Where  is  IBM  located?  really  means 
Where  is  IBM's  headquarters  located?.   In  an  extended  system, 
however,  where  the  context  permitted  companies  to  have  plants 
scattered  in  lots  of  places,  we  would  have  to  generalize  the 


-85- 


meaning  of  located  as  applied  to  companies. 

Bill  Martin  [MIT] : 

A  small  point.   Is  that  one  billion  which  you  displayed  in  a  slide 

dimensioned  as  a  billion  dollars? 

Petrick: 

Yes 

Martin: 

So  that  all  numbers  carry  around  their  dimensions? 

Petrick: 

Well,  in  the  syntactic  structure  they  do. 

Martin: 

And,    in    the    semantic   they   don't? 

Petrick : 

At  the  moment  they  don't.   We  talked  about  representing  numbers 

and   their   units    of  measure    as   pairs   but   haven't   done    so   because 

it  wasn't   required   so    far.       If  we    allowed  questions    involving 

different   units    or   units    that   were   distinct    from   those    of   our 

data  base  we  would  have    to   do   unit   conversions    and  would  have 

to   allow    for   the   more    complex  dimensioned   representation   of 

numerical   quantities. 

Martin: 

We  did  an  experiment  where  we  pretended  to  be  a  question 

answering   system   and    for   people  who   hadn't   used   the   system 

before,    we    found    that    2/3   of    their   questions   were    about  what 

data  was    on   hand   rather   than    actually    asking    for   a   summary   of 

the    data.       I   wonder   if   you   have   worried   about    this    or   if    for 

your   application   areas   you   can    ignore    this    problem  because 

your   users    already   know  what   data   they   have? 

Petrick: 

We   haven't  provided    for  many   meta-questions    about   the    system 

although   you   can   ask    a    few   such   questions    such   as    finding   out 

what   companies    are    represented   in    the    data  base.      But   I    agree 

with  you  that  questions  of  this  type  are  very  common,  especially 

when  the  system  is  used  by  persons  other  than  those  it  was 

intended  for.   We  have  asked  many  people  to  use  our  system  in 


-86- 


order  to  find  out  whether  we  can  handle  their  questions,  and 
our  least  successful  sessions  have  been  with  my  daughter  and 
some  of  her  friends  who  know  very  little  about  the  Fortune  500 
and  are  not  interested  in  business  statistics  so  they  tend  to 
inquire  as  to  what  they  can  ask  questions  about  and  to  assume  a 
knowledge  of  the  world  which  far  exceeds  the  limited  information 
in  the  Fortune  500. 
Peter  Wegner  [Brown  University]  : 

Two  questions.   First,  concerning  the  example  that  ran  out  of 
time.   What  was  it  that  made  it  run  out  of  time? 
Petrick  : 

It  was  in  syntactic  analysis.   That  is  where  most  of  our  time 
goes  because  our  underlying  structures  are  deeper  than  those 
of  other  systems.   Our  relatively  more  complex  syntactic  analysis 
simplifies  the  amount  of  semantic  analysis  necessary,  so  the 
amount  of  time  we  spend  on  converting  underlying  syntactic 
structures  to  logical  representations  and  on  interpreting 
those  logical  representations  with  respect  to  our  data  base  is 
relatively  small.   We  have  taken  great  pains  in  writing  our 
current  grammar  to  minimize  the  nondeterministic  continuations 
that  can  lead  to  excessive  syntactic  analysis  time,  and  although, 
as  we  have  seen,  syntactic  analysis  can  still  take  a  lot  of  time, 
the  complexity  of  the  sentences  for  which  this  occurs  is  suffi- 
cient to  make  it  hard  for  humans  to  understand  those  sentences 
too. 
Wegner: 

The  other  question  concerns  your  review  in  Computing  Reviews  on 
Winograd's  work,  which  indicated  that  Winograd-like  systems  had 
no  future  because  they  were  too  ad  hoc  and  you  couldn't  go  on 
pushing  in  that  direction.   I  don't  know  if  you  care  to  comment 
on  that  in  relation  to  your  system,  which  is  obviously  not  the 
same  kind  of  system,  but  maybe  you  could  say  a  little  bit  about 
how  you  feel  the  future  of  intelligent  data  base  systems  will  go. 
Petrick: 
Well  I  guess  I  would  argue  that  there  is  a  significant  difference 


-87- 


between  our  system  and  Winograd's  with  respect  to  logical 
coherence  and  extensibility.   Our  system  does  make  it  much 
easier  to  add  on  new  syntactic  constructions  which  are  fully 
integrated  with  old  ones  and  also  to  incrementally  enrich  the 
semantic  domain  treated.   I  have  talked  to  a  number  of  people 
who  have  tried  to  use  the  Winograd  system  and  who  report  great 
difficulty  in  extending  it  or  even  determining  the  reason  for 
its  failure  on  sentences  which   were  expected  to  run.   Perhaps 
someone  here  with  relevant  experience  would  care  to  comment 
on  this. 

Bill  Woods  [BBN] : 

If  you  don't  think  that  pushing  the  ad  hoc  kind  of  thing  that 
Terry  {Winograd]  did  is  the  way  to  go,  do  you  think  that  apply- 
ing the  classical  transformational  analysis  approach  as  you 
are  doing  is  the  way  to  go  when  there  are  things  like  transition 
networks  around  which  are  formal  and  which  don't  handle  things 
in  the  kind  of  ad  hoc  way  Terry  did,  yet  which  gain  just 
tremendous  speed  over  the  classical  reverse  transformational 
procedure . 
Petrick: 

Well,  I  agree  with  you  that  given  my  choice  between  doing  what  he  did 
and  doing  what  you  did, I  much  prefer  your  approach.  But  it  is  not 
at  all  clear  to  me  that  the  tremendous  speed  advantage  that  you  are 
claiming  is  all  that  real  or,  to  the  extent  that  it  may  be  real, 
important.   We  seem  to  be  closer  to  the  same  ball  park  than 
I  would  have  expected  a  while  back,  so  I  think  there  is  something 
to  be  said  for  the  transformational  approach.   There  are  many 
transformational  grammarians  active  in  contributing  to  the 
transformational  literature,  although  I  must  acknowledge  that 
they  tend  to  talk  about  what  one  should  do  instead  of  actually 
doing  it.   But  I  think  that  making  use  of  the  model  which  is 
currently  being  used  by  linguists  offers  at  least  some  compensa- 
tion for  the  added  overhead  of  computation,  if  any,  incurred  in 
its  use.   This  certainly  hasn't  been  our  problem  in  developing 
and  using  the  REQUEST  System.   We  process      sentences  which 


-88- 


are  submitted  in  something  like  five  to  ten  seconds,  and  that 
is  fast  enough  that  processing  time  is  not  a  problem  to  us. 
Rather,  our  largest  problem  is  in  extending  the  grammar. 
Although  I  haven't  followed  your  approach  I'm  sure  you  would 
agree  that  adding  some  new  syntactic  phenomenon  means  you've 
got  to  worry  about  integrating  it  with  a  lot  of  other  things 
which  already  exist. 
Woods : 

I  think  we  have  less  trouble  than  you  do  with  the  classical 
transformational  grammar. 
Petrick : 

Just  off  hand  I  would  say  just  the  opposite.   When  you  add  one 
transformation  or  gcoup  of  transformations  to  a  grammar  to 
extend  its  coverage  to  include  a  new  syntactic  construction  you 
have  to  be  concerned  about  the  set  of  intermediate  derived 
structures  involved.   Stating  transformations  reasonably  well 
in  the  first  place  is  an  exacting  task  as  is  debugging  them 
to  insure  they  work  as  intended,  but  nevertheless  in  a  modular 
way  you  can  throw  a  new  transformation  or  group  of  them  in  and 
thereby  cover  a  new  syntactic  phenomenon.   Wheras  in  your  case 
adding  a  new  syntactic  phenomenon  is  likely  to  add  new  paths 
from  many  different  states  and  hence  the  incremental  changes 
are  likely  to  be  distributed  about  the  whole  network,  compli- 
cating interactions  with  existing  paths  and  partial  paths. 
Of  course,  you  are  more  familiar  with  your  own  model  and  hence 
would  be  expected  to  find  its  use  in  extending  grammatical 
coverage      easier  than  that  of  a  model  you  have  not  used. 
Woods : 

Doing  it  hasn't  worked  out  to  be  that  difficult  when  your 
original  grammar  has  been  structured  in  the  right  way  as  you 
build  it.   We  found  that  as  you  start  parsing  sentences  with 
it  out  of  a  string  of  conversations  and  come  to  new  sentences 
that  it  doesn't  deal  with  correctly,  you  usually  find  that 
there  is  a  right  place  in  the  grammar  to  make  a  change.   That 
is  equivalent  to  adding  one  or  two  more  arcs  or  changing  a 


-89- 


condition   on   an    arc  when   you   are    generalizing   something   or 

restricting   something,    and   the    change    is    effective,    and  you 

don't   seem  to   have    the   problem  of    unforeseen    consequences    that 

you   apparently   do  when   you  write    classical   transformations. 

Petri ck: 

I  am  not  sure  what  you  mean  by  unforeseen  consequences. 

Woods : 

If  you   take    a   transformation    that    can    take    a   certain  pattern    and 

move    it   around,    and  you  were   thinking   of   it    applying   in    this 

context  but   you   didn't   realize    that    there   was    a   context   sort 

of    like    that  which   comes    up   somewhere   else    and  you  didn't   put 

a    'not'    specification    in   your   structural   description   to   rule   out 

that   other   context,    and   so   the    thing   applies    to   an    intermediate 

structure  that   you  hadn't   remembered  was    a  possible    intermediate 

structure   that   might   come    up.      And   the    result   is    just   garbage. 

Petrick : 

That  kind  of  thing  can  happen,  and  the  real  question  is  how  often 

does  it  happen  and  how  hard  is  it  to  find  and  correct  relative 

to  similar  problems  you  have  when  you  throw  in  changes  designed 

to  extend  grammatical  coverage   and  thereby,  for  example, 

inadvertently  specify  paths  (sentences)  which  were  not  intended 

and  which  must  be  eliminated  by  separating  by  duplication  paths 

which  have  joined  together  or  else  adding  extra  conditions  on 

some  of  the  arcs. 

V^oods  : 

I    don't   think  we    can    resolve    that. 

Petrick: 

No.   It  would  require  an  individual  who  had  been  exposed  the  same 

amount  of  time  to  both  formalisms  and  who  attempted  to  achieve 

approximately  the  same  coverage  using  both  models. 

Woods : 

But   I    don't   think   your  question    about   a   speed   difference,    whether 

it   is    real   or   not,    can   be    answered  on    a  more    concrete  basis. 

You  make    the    case    that   your   system  has    gotten    into   the    same   ball 

park   by   virtue   of   the    machine   you   have    run    it   on    getting    faster 

and   faster. 


-90- 


Pet rick: 

That  is  part  of  it. 
Woods : 

I  designed  an  algorithm  that  was  fundamentally  fast,  but  I 
implemented  it  in  a  way  that  deliberately  had  a  lot  of  over- 
head, and  I  ran  it  on  a  time  sharing  machine  which  pages  and 
introduces  overhead.  It  was  written  in  LISP,  which  introduces 
overhead  of  its  own.   And  my  machine  is  not  all  that  fast. 
Petrick : 

We  have  done  a  lot  of  thinking  about  improving  parsing  efficiency 
and  I  know  lots  of  ways  in  which  I  could  make  my  parser  run 
significantly  faster.   I  haven't  done  it  because  I  feel  that 
it  would  be  mistaken  effort  at  this  point.   I  think  that  if 
it  were  the  case  that  we  had  something  that  was  really  worth- 
while and  the  only  problem  was  that  it  ran  too  slowly,  that's  a 
problem  about  which  quite  a  bit  could  be  done. 
Question  [unidentified] : 

Don't  you  also  use  a  LISP  system  running  on  a  time  sharing  system? 
Petrick: 

Yes.   But  it  is  true  that  the  machine  is  basically  ten  times  as 
fast.   I  am  not  saying  that  our  computer  isn't  faster,  and  I  am 
sure  our  parser  is  slower  than  yours.   They  are  in  some  sense 
disparate  because  your  structures  are  relatively  not  as  deep  as 
ours  and  we  do  quite  a  bit  in  what  you  call  semantics  that  we 
would  call  syntax. 
Question: 

The  language  part  and  the  answering  part,  that  is  an  easy  place 
to  break. 
Martin: 

I  actually  have  a  grammar  that  various  undergraduates  have  been 
putting  together  and  to  parse  som.ething  like  How  much  did  we 
sell  to  Syrus  in  72?  takes  about  2/10  of  a  second  on  a  PDP-10. 
And  I  found  it  also  very  easy  to  add  things,  so  this  is  a  good 
way.   I  believe  that  what  is  going  to  happen  is  that  we  are 
going  to  complete  this  thing  so  that  there  is  going  to  be  some 
sort  of  a  race  developing,  over  time. 


-91- 


Petri ck : 

It  is  clear  that  when  you  compare  times,  at  least  syntactic 

processing  times,  we  are  slower  than  you  are,  but  I  don't  think 

that  the  difference  is  an  overriding   consideration  now. 

I  think  a  more  important  consideration  has  to  do  with  what  kind 

of  coverage  of  English  can  we  get.   Is  that  coverage  enough 

for  doing  something  useful  and  are  the  structures  that  we 

compute  adequate  for  doing  the  kind  of  semantic  processing 

which  is  necessary? 


(Ed.  note  added  by  S.  R.  Pe trick: 

Subsequent  to  this  exchange  with  Bill  Woods  I  ran  four  sentences 

under  REQUEST  which  differ  only  lexically  from  sentences  for 

which  Woods  has  supplied  timing  data  in  the  literature. 

Thus  in  place  of  Woods'   How  many  lunar  samples  are  there? 

I  ran   How  many  profitable  companies  are  there? 

The  total  times  for  producing  a  logical  form  for  each  of  these 

four  sentences  were: 

Sentence 


Time  in  Seconds 

Time  in  Seconds 

(LSNLIS) 

(REQUEST) 

14.865 

3.607 

7.191 

4.031 

11. 856 

3.465 

17.525 

3.235 

1 

2 
3 
4 

Both  systems  are  programmed  in  LISP  and  run  on  paged  time  shar- 
ing systems.   The  LISP  systems,  time  sharing  systems,  and 
computers  used  by  the  two  systems  are  different,  of  course. 
To  meaningfully  factor  out  these  differences  one  must  compare 
the  speed  of  the  two  LISP  systems  running  on  their  respective 
computers  under  their  respective  time  sharing  systems.   One  such 
study  was  made  at  the  IBM  Research  Center  comparing  the  time 
required  to  execute  some  relatively  small  compiled  LISP  functions, 
and  a  speed  advantage  of  three  to  one  was  reported  for  our 
LISP  system  over  the  PDP-10  system.   Toother  study  suggested  an 

-92- 


even  lower  ratio  was  more  likely  correct.   Dividing  all  of 
Woods'  times  by  three  produces  computer  system-normalized  times 
which  are  still  longer  than  ours  for  three  of  the  four  sentences. 
I  would  be  cautious  in  concluding  too  much  from  such  a  small 
sample,  but  the  results  do  seem  to  be  consistent  with  my  observa- 
tion that  both  systems  appear  to  be  in  the  same  ball  park  with 
respect  to  speed,  in  particular,  it  should  be  noted  that  the 
times  can  be  meaningfully  compared  only  if  the  coverage  of 
English  provided  by  the  two  systems  is  comparable,   and  I   have 
made  no  such  study.   It  is  clear,  however,  that  both  of  these 
systems  offer   si±>stantially  greater  coverage  of  English  than 
other  question  answering  systems  which  have  been  developed,  and 
this  is  the  reason  why  some  of  the  other  systems  report  very 
fast  sentence  analysis  times.) 


-93- 


DISCUSSION 

Among  the  speakers  (William  Martin,  Stanley  Petrick,  Naomi  Sager, 
Robert  Simmons,  William  Woods)   and  members  of  the  audience. 

Jerry  Hobbs  [City  College  of  New  York] : 

Most  of  the  applications  that  all  of  you  have  been  talking  about 
are  in  fairly  complex  fields  like  pharmacology,  lunar  rocks, 
and  Fortune  500,   assessor  data,  etc.,  and  the  trouble  with 
dealing  with  this  is,  as  Petrick   mentioned,  that  you  keep 
falling  through  holes  in  your  model,  asking  questions  about 
words  like  "commercial"  which  are  not  stored  there.   It  seems 
to  me  that  a  more  reasonable  strategic  approach  to  the  whole 
problem  would  be  to  start  at  very  common,  low  level  kinds  of 
areas.   For  example,  space  and  time  terms,  verbs  of  motion, 
mental  acts   and  a  very  simple  kind  of  social  and  economic 
behavior,  so  that  in  a  sense  you  build  up  a  net  that  will  catch 
you  when  you  fall  through  the  holes  in  your  models  for  the  larger 
systems.   Is  this  a  reasonable  approach? 
Woods : 

It  depends  on  what  you  are  trying  to  do.   If  you  want  to  build 
a  system  that  doesn't  have  any  holes  in  it,  then  you  can  restrict 
yourself  to  something  that  is  small  enough  and  simple  enough 
that  you  won't  hit  any  of  them.   But  that  is  exactly  the  wrong 
thing  to  do  from  my  point  of  view. 
Hobbs : 

That  is  not  true;  any  time  you  try  to  restrict  yourself  there  is 
a  natural  tendency  for  real  world  text  to  go  beyond  your  restric- 
tions . 
Woods : 

The  point  I  would  like  to  make  is  that  the  methodology  you  want 
is  exactly  the  one  that  causes  things  to  fall  into  holes  in 
your  system.   Because  what  you  want  to  do  is  find  those  holes. 
And  in  fact  the  methodology  I  would  advocate  is  to  take  a 
restricted  problem  domain  because  for  such  a  domain  you  can  have 


-94- 


a  concrete  notion  of  what  the  things  that  are  said  in  it  mean. 
You  can  tie  it  to  actual  answers  that  had  better   come  back  out 
of  the  system  or  actual  things  that  it  had  better  do.  But  you 
should  pick  that  application  to  raise  hope,  to  uncover  things 
that  you  don't  know  how  to  understand.   Because  what  you  would 
like  to  do  is  develop  a  set  of  techniques  for  understanding 
language  that  will  become  general  purpose  and  will  cover  all  of 
the  kinds  of  things  that  will  turn  up.   And  you  find  as  you  go 
from  one  application  to  another  that  different  sets  of  things 
become  problems.   In  the  airline  flight  schedules  system,  for 
instance,  one  talked  about  planes  and  cities,  and  hardly  ever 
talked  about  people.   And  so  we  ruled  people  out,  because  as 
soon  as  you  get  people  in,  you  get  a  whole  set  of  intentional 
verbs  that  can  only  take  human  subjects.   As  long  as  you  keep 
the  human  subjects  out   you  can  keep  those  verbs  out.   And 
those  verbs  have  a  lot  of  problems  with  them.   So  one  way  to 
restrict  yourself  to  a  smaller  set  of  problems  is  to  restrict 
yourself  to  a  domain   in  which  those  problems  won't  come  up. 
But  once  you  have  got  some  techniques  that  deal  with  that  problem, 
then  you  should  move  to  another  domain  that  is  going  to  bring 
up  a  different  set  of  problems   and  test  whether  the  techniques 
you  have  developed  on  your  first  application  area  will  in  fact 
extend  to  a  second  application  area,  and  a  third  application  area, 
and  see  what  new  things  come  up.   I  have  gone  through  a  sequence 
of  several  applications  now,  with  the  same  parser  and  the  same 
semantic  interpretation  technique  and  encountered  new  kinds  of 
phenomena  each  time.  In  the  lunar  application,  mass  nouns  ... 
Hobbs : 

The  trouble  with  that  is  that  the  data  base  you  have  filled  up 
in  that  manner  for  each  application  is  worthless  when  you  go 
to  the  next  data  base.   Perhaps  the  insights  you  gained  aren't 
worthless  but  the  data  base  you  built  up  is.   Whereas  if  you 
just  took  as  your  applications,  so  to  speak,  very  low  level 
kinds  of  things,  like  space-time  terms,  verbs  of  motion,  and 
mental  and  social  and  economic  acts  of  a  very  simple  nature, 
these  would  be  the  kinds  of  things  that  occur  in  almost  every  text. 


-95- 


Woods : 

Let  me  respond   to  two  aspects  of  what  you  say,  one  of  which 
I  agree  with  wholeheartedly.   The  specific  facts  you  are  talking 
about  building  up  a  model  for   are  very  well  worthwhile  doing, 
and  ought  to  be  a  common  subpart  of  almost  any  system  that  people 
would  like  to  build.   The  knowledge  of  space  and  time  and  people 
and  dates  and  places  is  a  common  body  of  knowledge  that  you  would 
like  to  have  in  the  system  in  a  codified  form  that   people  could 
use.   OK.   But  the  other  assumption  that  says  you  should  stay 
with   just  those,  you  shouldn't  tred  outside,  you  shouldn't  try 
to  pick  other  problems  --  I  disagree  with  that. 
Sager : 

I  think  it  is  deceptive  to  think  those  areas  are  simple. 
Hobbs: 

I  am  not  saying  they  are  simple.   I  am  saying  that  they  are 
necessary  to  understand  almost  any  text. 
Sager: 

Yes,  but  they  don't  have  anything  you  can  work  with.   There  are 
lots  of  problems  that  are  unsolved  in  the  world.   The  only  ones 
you  can  solve  are  the  ones  you  have  methods  for.   Now  in  restric- 
ting yourself  to  a  subject  matter  area,  as  Bill  [Woods]   ^g 
suggesting,  and  the  way  that  we  are  working,  at  least  you  know 
that  there  are  some  constraints  operating  in  this  material, 
that  you  have  a  chance  of  capturing  in  some  kind  of  structuring. 
If  you  take  the  most  colloquial,  the  most  common,  these  are  the 
words  which  have  many  different  meanings  in  many  different 
situations;  they  are  precisely  the  most  difficult  things  to 
operate  with.   Perhaps  when  we  can  solve  the  problem  in  this 
and  this  and  this  area,   it  will  turn  out  that  these  simpler 
areas  are  really  smears  over  a  lot  of  much  more  precisely 
defined  areas. 
Petrick: 

You  seem  to  feel  that  if  we  have  very  primitive  relations  into 
which  you  break  everything  down,  then,  having  done  this  for  one 
application,  you  have  got  part  of  what  you  need  to  get  done 


-96- 


already  accomplished  when  you  go  on  to  another  application.  But 

it  might  be  more  complicated  to  express  the  new  concepts  that 

are  going  to  arise  in  either  case  in  terms  of  your  old 

primitives . 

Hobbs : 

All  I  am  suggesting  is  that  you  would  have  this  common  base  as 

a  net  to  catch  you  when  in  fact  you  did  fall  through  the  holes 

in  the  models  that  you  proposed  for  more  complex  areas. 

Woods : 

How  would   it   help   you   if  you   fell    through    ahole   where   you    just 

didn't   understand   something   about   disintegrations    of   isotopes, 

say?      You  know    if   you    fall    through    a  hole    in   that   model   your 

net   is   not    going   to   help   any.       It   is    only    going   to   help   in 

certain   holes. 

Petrick : 

These  are  things  you  have  to  relate  to  something.   It  is  a 

question  of  whether  you  are  relating  them  to  Primitives  you 

have  already  used  or  whether  you  are  to  make  up  a  whole  new  set 

for  this  particular  purpose;  and  sometimes   one  approach  might 

be   better,  sometimes  the  other. 

Martin: 

I    happen    to    feel    it   is    not   that   hard   to   tell   how  well  you   are 

doing.       A   critical    assessment   of  what   you  have    generally   shows 

all   kinds    of    things    that   you   don't   understand.       So   that   you 

don't  need,    each    time   you   use    it,    to  point    out   many    interesting 

problems      that   you   don't  have    solved  yet. 

Woods : 

Sometimes    you   do. 

Martin: 

By   various    techniques    other   than    implementing  you   can    learn    a    lot, 

and   I    think    that   you   should  pursue    a   strategy    using   various 

techniques    and   looking   at   how   much   each   one    takes    in    the  way   of 

resources.       For   instance,    one    thing  we    did   is    pretend   to  have    a 

system.      A  person    in    another   room   in    fact    fills    in  when   the   system 

can't   do   something   and   users    just   type    in    as    if   it  were    the 

system  that  you  wanted.      You  record   all    the   protocols,    and 


-9  7- 


analyze  how  many  words  they  use,  how  many  sentence  types;  gee, 

could  I  do  that  one,  could  I  not  do  that  one.   You  see  what  the 

problems  are  that  occur  there   and  this  provides  information 

that  is  very  cheap  compared  to  building  a  system  like  that. 

You  can  pick  any  area  you  want  to,  to  see  what  it  will  do  in 

that  area.   We  did  some  of  this  with  speech,  in  fact,  also, 

with  the  terminals  in  different  rooms,  etc..   That  is  the 

strategy  you  have. 

Another  strategy  involves  trying   a  representation  scheme. 

Just  sit  down  with  an  article  and  a  field  of  interest  and 

try  to  represent  a  couple  of  sentences.   You  may  not  get  past 

the  first  paragraph  before  you  have  all  kinds  of  ideas  of  the 

limitations  of  your  scheme.   All  of  these  techniques  can  lead 

you  to  a  set  of  problems  which  then  --  once  you  really  feel 

you  have  a  solution  to  them  --  you  may  implement,  say,  in 

something  very  simple.   You  can  generally  contrive  a  way  to 

test  very  many  things  in  a  simple  environment   if  you  know  what 

the  ideas  are  that  you  want  to  test.   So  I  don't  think  you  should 

just  work  on  one  thing   and  just  have  one  strategy  which  is 

implementation.   I  also  feel  that  you  can  trust  yourself  to 

look  for  the  problems  to  a  greater  degree  than  most  people  think. 

There  seem  to  be  two  schools  of  thought:  those  who  say  you  have 

to  get  it  out  and  get  it  used  a  lot;  and  those  who  say,  well, 

we  haven't  had  any  users  at  all.   In  fact,  one  user  is  a  very 

important  intermediate  step.   When  we  build  our  algebraic 

manipulation  system,  it  took  us  about  two  or  three  years  to 

be  able  to  handle  all  the  problems  of  just  one  user. 

Every  time  he  would  come  in ,  we  would  have  to  work  for  3  or  4 

months.   A  heavy  user  just  wasn't  in  the  cards. 

Nagib  Badre  [IBM,  Yorktown] :    [to  Woods]: 

You  said  that  one  can  represent  the  meaning  by  a  procedure  that 

decides  the  truth  or  falsity  of  something,  and  then  you  mentioned 

the  halting  problem  to  show  that  it  has  to  be  a  partial  procedure. 

It  seems  to  me  that  it  is  really  more  than  that.  Because  if  I  ask, 

is  this  grammar  LR(k)  for  some  k,  the  reason  people  understand  it 


-98- 


is  not  at  all  connected  with  it  being  either  true  or  false 
or  undecidable.   They  understand  it  because  they  know  what 
it  is  for  something  to  be  a  member  of  a  class,  if  they  know 
what  that  class  is;  they  may  look  at  the  grammar  and  may  not 
be  able  to  answer  the  question  and  yet  still  understand  it. 

I  would  like  to  cite  some  psychological  issues,  if  I  may. 
When  psychologists  give  sequences  of  words,  meaningful  words, 
and  they  make  memorization  tests  on  people,  it  is  known  that 
people  remember  better  sequences  of  meaningful  words  than 
sequences  of  nonsense  words.   I  think  there  have  been  also 
experiments  where  nonsense  words  would  be  given  except  that 
right  after  the  experiment  there  would  be  a  certain  number  of 
nonsense  sentences  made  up  of  those  nonsense  words;  and  then 
memorization  would  be  better  with  nonsense  words.   This  rela- 
tional aspect,  this  idea  of  being  able  to  relate  something  to 
other  things,  seems  to  be  very  important  in  what  we  defined  as 
understanding  and  giving  meaning  to  something. 
Woods : 

That  is  true.   I  had  a  list  of  different  things  that  called 
themselves  semantics   and  certainly  I  raised  the  point  that  an 
expression  which  is  a  meaning  or  which  represents  a  meaning 
has  to  have  its  meaning  defined  in  terms  of  something  like 
procedures.   But  you  should  not  infer  that  therefore  the  parti- 
cular structure  of  this  set  of  symbols  is  unimportant,   all  that 
is  important  is  that      thing  that  it  represents.   It  turns 
out  that  when  you  start  worrying  about  how  to  effect  the 
meanings  of  adverbs  in  a  uniform  way  you  start  to  find  that 
it  is  not  enough  to  say  that  a  meaning  of  a  certain  expression 
is  a  procedure  for  testing  truth  or  falsehood.   You  have  to 
know  something  about  the  way  that  procedure  is  structured. 
You  have  to  be  able  to  look  at  the  structure  of  the  represen- 
tation of  the  procedure  and  infer  facts  about  it  and  infer 
that  it  is  related  to  other  facts  and  so  forth.   So  the  internal 
way  that  the  procedure  is  specified  is  important  for  many 
applications.   In  the  LUI^AR  system,  at  least  in  the  way  that  I 


-99- 


have    implemented   it,    it   is   not   so   important.      But   it   certainly 
is    important    for   a   general    language    understanding   system  which 
is    going    to   deal  with    structured   or      nonstructured   natural 
language    data.    The    internal    representations    of   structures    are 
going   to  be    important.      The    semantic   network    type    of   notations 
that   people    are    developing,    that   have    specific    links    from  one 
concept   to    another,    I    think    are    getting   close   to   an   element 
of   truth      in    the    same   way    that    transformational    grammars   have 
a   large   element   of   truth    in    the   way    that    they   encode    language 
behavior.      But    they    also   have    some    limitations    and   some 
inadequacies    that  we   haven't    found   solutions    to  yet.      Most   of 
the    semantic  network    representations    don't   deal   with   quantifi- 
cational   knowledge    very  well.      There    are    other   uses    of   semantics 
that   show   up      especially   in    speech   understanding   where  you   use 
semantics   not   just    to  make    judgements    —   well,    is    this   well 
formed   or   not.      You   use    it   not    just    to    answer   questions   by    trans- 
lating what   it   means    into    a   procedure    to    carry    out.       But   you   use 
semantics    to  predict  pieces    that    are   missing,    that   you  haven't 
heard  yet.      So   if   you  get    a   sentence    like    'has    anybody   blanked 
chemical    analyses    of    this    rock',    you  know    from  your  knowledge    of 
semantics    that    the   blank   should  be    'measured'    or    'reported' 
or    'done'    or   some  word   of   that   sort.      And   in    speech    understanding, 
that   sort   of    use   of    the    associational    semantics    becomes    really 
important.       The    associational   semantics    is    important    for   the   kinds 
of    foregrounding  effects    that   people   exhibit      when  you   have   half 
a   dozen   senses   of   a  word,    but  you   don't   notice    the    other   ones 
because    just   the    one    that   makes    sense    seems    to   come    to  mind. 
And  you   can   model    that  with   Qullian      type    semantic   intersection 
theory    that   would   say:       the   one    that    is    right    in    this    context 
will   happen    to   be    close    to   other   concepts    that   have   been    activated 
by   previous    discussion;    and   the    one   that    is   wrong    for    this    context 
won't  be;    and   that   will   be    the    tie   breaker    that  will   choose    the 
sense   you  want.         There    are   much   more    subtle    things    that    go   on 
in    language    understanding   that   aren't   quite    that   easy    to  explain, 
though.    Terry  Winograd      has    in   his    thesis    a  pair   of   sentences 
which    is   becoming    famous    as    the   City   Council   problem.       He   has: 


-100- 


'The  City  Council  refused  to  grant  the  women  a  permit  because 
they  feared  violence  '  .     and 

'The  City  Council  refused  to  grant  the  women  a  permit  because 
they  advocated  violence ' . 

And  the  question  is:   does  'they'  refer  to  women  or  City  Council? 
And  people  seem  to  do  that  disambiguation  just  as  fast  and  just 
as  easily  and  just  as  conveniently  with  no  conscious  awareness 
that  they  had  to  do  anything  at  all,  as  these  other  cases  that 
we  talked  about.   But  it  seems  that  there  is  no  simple  kind  of 
superset   type  inference  that  resolves  that.   If  you  would 
simply  ask:   Who's  likely  to  fear  violence,  the  women  or  City 
Council?   In  the  total  absence  of  any  other  context,  if  there 
is  any  preference  at  all  it  might  be  women   because  there  are 
always  stories  about   violence  in  the  streets.   And  women  are 
usually  the  canonical  victims  that  evoke  the  sympathy  of  the 
crowds;  City  Councils  just  don't  get  talked  about  in  those  terms. 

So  clearly  it  is  something  about  whether  it  constitutes  a 
reason  for  refusing  to  grant  the  permit  that  is  really  making 
the  difference.   And  it  seems  that  in  order  to  get  that  resolved 
you  have  to  virtually  go  as  far  as  to  build  up  both  possible 
interpretations  and  evaluate  which  is  more  plausible.   And  if 
it  is  the  case  that  we  do  all  that  as  rapidly  and  unconsciously 
and  effortlessly  as  we  apparently  do,  then  the  amount  of  proces- 
sing that  we  do   totally  below  the  level  of  consciousness  and 
extremely  rapidly   is  a  lot  more  than  one  might  expect. 
Linda  Misek  [Vassar  College]: 

I  would  like  to  go  back  just  one  minute  to  the  question  of  whether 
whether  there  might  be  conceptual  archetypes  of  basic  semo- 
syntactic,  or  however  you  want  to  label,   deep  structure 
primitives.   I  would  appreciate  the  remarks  of  anyone   on  the 
panel  concerning  one  specific  point:  Granted  that  when  you 
partition  the  world  using  categories  you  do  it  in  order  to  get 
a  job  done;  probably  all  projects  are  problem  solving  oriented 
and  the  operational  approach  is  the  most  promising.   Do  you, 
as  you  look  across  each  other's  current  semantics,  case  structures. 


-101- 


or  category  constructs  that  you  each  presented  today  see  any 
consistencies   or  items  which  might  be  constants  or  candidates 
for  being  constants?   Such  items  might  be  the  very  nodes  that 
this  gentleman  {Hobbs ]  is  talking  about  when  he  says  a  'net' 
--  when  you  fall  through  a  hole,  some  prim.itives  or   constants 
consistent  across  all  of  your  descriptions   that  might  plug  the 
hole.   Are  you  very  optimistic  that  these  exist?   I  hope  so 
because  if  you  are  not   what  will  we  do? 
Simmons : 

I  think  there  are  things  like  cases  showing  up  somewhere 
between  the  shallow  syntax  and  the  semantics  in  almost  every- 
body's system.   I  don't  know  whether  Petrick's  system  shows 
anything  resembling  those  categories  or  not.   There  is  a  need 
in  every  system  for  some  kind  of  quantificational  logic. 
There  is  a  need  in  every  system  for  some  kind  of  temporal 
representation.   I  don't  know  how  to  go  much  further  than 
that  though. 
Martin : 

I  have  looked  at  some  and  I  don't  know  if  I  can  restrict  myself 
just  to  the  particular  systems  here;  but  there  seems  to  be  a 
set  of  issues  that  helped  me  understand  some  of  the  differences 
between  some  of  the  schemes  that  I  have  seen.   And  one  of  them 
is  that  there  seems  to  be  a  notion  that  Schank  and  a  lot  of 
people  have,     a  fairly  deep  notion,  that  something  starts 
in  an  initial  state  and  transfers  into  some  final  state;  maybe 
this  thing  is  called  the  patient.   For  instance,  consider  the 
sentences  "I  go  from  school  to  home",  and  "I  left  school". 
The  surface  guys  are  going  to  say  that  'school'  is  the  direct 
object  in  the  second  sentence,  since  it  doesn't  start  with 
'from';  there  is  no  'from  school'  in  the  second  and  therefore 
it  is  different  from  "I  go  from  school".   Whereas  the  deep 
structure  people  are  going  to  say  no,  it's  really  the  same  thing; 
it's  just  a  different  way  of  saying  it.   So,  there  seems  to  be 
some  bias  as  to  how  deep  down  you  are  trying  to  go,  what  set  of 
cases  you  end  up  with.   Whether  you  let  things  like  prepositions 
influence  you  very  much  in  your  choice,  for  example.   If  you 


•102- 


have  some  notion  of  what  the  alternatives  are,  there  is  not 
that  much  difference  between  some  of  the  different  systems. 
Simmons : 

Even  in  string  analysis  or  transformational  analysis,  it  is 
fair  to  consider  subject  and  object  and  indirect  object 
as  cases,  as  very,  very  shallow  cases. 
Woods : 

There  are  places  where  you  want  different  levels  of  represen- 
tation, and  different  kinds  of  representation.   Schank  tends 
to  pick  one  and  then  he  makes  the  point  that  the  place  he  picks 
is  arbitrary;  I  think  that  what  actually  goes  on  is  that  you  have 
the  capability  to  refine   or  to  elevate   your  level  of  descrip- 
tion in  terms  of  primitives  depending  on  the  level  of  knowledge 
the  problem  requires.   In  fact  there  is  a  system  we  haven't 
mentioned  here  today  by  John  Seeley  Brown   that  does  intelligent 
computer-aided  instruction  for  an  electronics  troubleshooter. 
He  is  learning  how  to  troubleshoot  circuits.   And  that  system 
manages  to  do  in  effectively  real  time   very  nice  inferences 
sufficient  to  let  a  student  ask  questions  about  the  state  of 
a  circuit,  make  changes  in  the  state  of  the  circuit,  and  so  forth. 
All  these  happen  on  the  order  of  a  few  seconds,  and  the  technique 
that  he  uses  involves  a  number  of  different  representations  of 
essentially  the  same  knowledge.   Some  of  them  are  at  a  theorem 
proving  level  where  you  can  do  pattern  matching  and  draw 
inferences  but  that  processing  is  sort  of  lengthy.   And  others 
are  right  down  at  the  level  where  you  can  set  up  the  parameters, 
run  a  simulator  on  the  circuit,  let  it  reach  steady  state,  and 
look  at  the  values  at  all  the  nodes .   He  has  a  variety  of 
specialists  along  that  scale  and  he  moves  up  and  down  that 
level  of  specificity  to  just  the  right  one  to  make   each  of 
the  different  kinds  of  inferences  he  does.   And  he  gets  very, 
very  fast  systems. 
Misek: 

Do  you  think  that  it  would  be  interesting  to  select  passages  of 
different  genres  of  discourse,  probably  all  in  English,  and 


-103- 


simultaneously  run  each  of  your  taxonomies  --  even  though  you 

have  presented  them  to  us  as  provisional  and  operational  in 

context  —  over  these  natural  language  texts   to  see  a  couple 

of  things.   One,  what  patterns  in  the  natural  language  text 

emerge  more  clearly  to  us  by  virtue  of  simultaneously  observing 

the  features  which  your  taxonomies  would  attach  to  those 

patterns?   Second,   what  consistencies  were  there  across  your 

categories   although  in  developing  and  embedding  them 

it  was  not  your  intent  to  artificially  force  generalization. 

Woods : 

I  don't  think  you  would  really  get  off  the  ground  with  that 

program  . . . 

Sager : 

I  think  we  have  to  agree  on  something,  I  mean  no  poetry  please. 

Misek : 

I  raise  the  question  because  the  manuscripts  to  which  I  have 

access  by  these  authors  are  not  as  rich  in  actual  lists  of  things 

as  one  would  need  in  order  to  apply  them. 

Simmons : 

I  begin  to  get  a  sense  from  a  couple  of  the  questions  that  there 

is  some  kind  of  assumption  tha  there  is  something  practical 

going  on  here. 

I  think   it  is  very  important  to  emphasize  that  despite  the 

fact  that  Stan  [Petrick]'s  system  will  answer  questions  and 

that  Bill  [Woods]'  system  will  answer  questions  and  various 

other  systems  will  do  things;  these  are  not  practical  systems. 

These  are  experimental  systems  to  learn  the  methodology. 

to  work  out  what  needs  to  be  done,  what  are  the  limitations 

of  what  we  now  know.   And  I  notice  in  a  couple  of  questions 

sort  of  the  assumption:   how  can  we  use  this,   how  can  we  make 

comparitive  analyses  of  what  one  system  does  and  another 

system  does.      And   I    don't   think    that    is    really    too   appropriate. 

Misek : 

I  think  this  would  apply  to  Mr.  Woods'  assertion  that  we  have 

to  try  and  understand  the  ontological  commitments    of  our 

notations;  why  would  not  a  cross-reference  study  of  the 


-104- 


notational  systems  yield  something  theoretically  useful. 

Simmons : 

Possibly  it  would,  but  why  not  study,  why  not  go  and  develop 

the  notations? 

Petrick : 

Most  of  the  sentences  would  not  be  analyzed  by  any  of  our  systems. 

Sager : 

I  support  the  young  lady.   I  think  it  is  very  hard  to  find  from 

the  literature  a  clear  picture  of  what  the  coverage  of  our 

systems  is.   I  think  Stan  [Petrick]  made  this  point  two  years 

ago  in  Washington  —  we  were  going  to  try  to  get  together  to 

make  it  possible  to  see  what  the  representations  are,  what  the 

coverage  is,  where  the  problems  are;   we  all  have  problems. 

Don't  think  you  just  take  a  text  and  push  it  in  the  machine 

and  it  comes  out  on  the  other  end  with  all  these  beautiful  parses; 

it  is  still  an  experimental  process.   But  I  think  it  is  a  very 

good  suggestion  —  if  there  are  some  sentences  that  we  can't 

handle  we  can  write  a  dissertation  about  what  problems  in  grammar 

or  semantics  are  involved   and  it  would  give  the  community  a 

chance  to  see  what  these  notations  are  worth.   I  think  it  is 

an  excellent  idea.   Now  do  something,  somebody. 

Woods  : 

What  you  are  talking  about  is  collecting  a  sort  of  bench  mark 

list  of  problem  sentences. 

Sager: 

Let's  start  with  the  ones  we  can  do. 

Woods : 

That  is  one  way  to  collect  them. 

Martin : 

I  think  that  a   little  more  conservative  approach  would  be  to 

elicit  from  each  person  all  the  sentences  they  claim  they  can  do 

and  then  compare  those  across  the  various  systems.   You  would 

still  have  a  lot  of  trouble  because  they  are  in  different  domains. 

If  you  go  back  to  Woods'  airline  system,  for  instance,  it  seems 

to  me  that  it  is  pretty  specific  to  airlines. 


-105- 


Woods  : 

No,  the  rules  in  the  airline  guide  application  are  essentially 
pattern-action  rules  that  say,  if  you  have  got  this  assemblage 
of  things   it  means  this.   I  in  fact  took  that  same  system 
and  started  putting  in  rules  to  let  it  talk  about  its  own 
grammar.   And  there  you  could  have  arcs  that  connect  states, 
whereas  in  the  airline  application  you  had  flights  that 
connect  cities.   And  you  could  ask  questions:  is  there  an 
arc  that  goes  from  state  S/  to  S/:\IP  and  the  thing  would  under- 
stand that  meant  the  kind  of  'go  to'  that  applied  to  connections 
of  arcs  in  the  grammar.   And  if  you  said,  "is  there  a  flight 
that  goes  from  Boston  to  Chicago?"   it  would  say,  oh,  that's 
the  kind  of  'go  to'  that  means   look  it  up  in  the  airline  guide. 
And  when  I  put  the  second  set  of  rules  in,  for  talking  about 
the  grammar,  I  didn't  take  out  the  first  set  of  rules  that 
talked  about  airline  schedules. 
Martin : 

Well,  that  is  not  my  point. 
Woods  . 

In  answer  to  the  question  that  was  raised  earlier,  if  when  you 
go  from  application  to  application  you  force  this  kind  of 
backwards  compatibility,  and  don't  adopt  an  approach  on  the  new 
application  that  wouldn't  have  worked  for  the  previous  one, 
then  you  tend  to  keep  stretching  your  system. 
Martin : 

Yes,  but  the  problem  is  that,  for  each  new  text,  she  is  going 
to  have  to  do  this  forward  extension  on  her  own  and  you  probably 
don't  provide  a  manual  that  says  how  you  would  do  that, 
so  that,  when  she  does  it,  her  conclusions  may  not  be  those 
you  would  get  if  you  would  have  done  that.   That  is  the  problem 
that  she  is  going  to  face. 
Petrick: 

All  she  could  do,  you  would  say,  is  start  with  a  group  of 
sentences,  make  some  minimal  perturbations  of  them  to  try  to 
produce  a  set  that  will  be  good  for  all  of  the  systems,  and 


-106- 


then  perhaps  you  come  back  to  us  to  say:   Is  this  something  that 
you  can  process?   Can  you  add  another  lexical  item  to  handle  this? 
Or  do  you  have  to  perturb  it  so  much  that  it  is  a  whole  new  problem? 
Just  to  get  it  working  syntactically  —  which  is   I  imagine 
what  you  were  talking  about  • —  not  worrying  about  the 
additional  semantic  problems. 
Peter  Wegner  [Brown  University]: 

I  think  there  are  two  kinds  of  semantics,  namely  the  semantics 
of  sentences  in  the  language  and  the  semantics  of  information 
in  the  data  base.   I  think  that  Bill  Woods  to  some  extent  implied 
that  you  could  handle  the  two  by  the  notion  of  procedure  in  a 
common  way.   And  yet  on  the  other  hand  there  are  practical 
differences  between  the  context  in  which  you  encounter  data 
bases  and  sentences  in  the  natural  language.  So  I  would  like 
to  ask  you,  do  you  find  you  could  have  a  common  framework  for 
these  two  kinds  of  semantics?    When  a  sentence  gets  translated 
into  some  internal  representation,  is  the  semantic  representation 
at  any  stage  of  the  translation  the  same  kind  of  representation 
that  you  use  in  the  data  base,  or  is  there  some  distinction? 
Woods : 

The  conceptual  possibilities  are  the  same  but  in  general,  when 
you  have  a  particular  sentence  immediately  in  mind  you  remember 
a  lot  about  it  and  have  lots  of  things  in  its  semantic  represen- 
tation that  you  tend  to  throw  away  and  forget.   What  ultimately 
ends  up  in  your  long  term  memory  has  been  condensed  somewhat  and 
factored  to  use  pieces  of  long  term  memory  that  were  already  there. 
So  it  is  not  a  matter  of  just  taking  the  semantic  representation 
of  the  sentence  and  copying  it  whole  cloth  into  your  long  term 
memory,  but  I  don't  think  there  is  really  a  different  inventory 
of  semantic  primitives  at  the  two  different  levels. 


-107- 


INCLASSIFIED 


SECURITY   CLASSIFICATION    OF   THIS   PAGE   (When   Data  Entered) 


REPORT  DOCUMENTATION  PAGE 


1       REPORT   NUMBER 

NSO-7 


2.    GOVT    ACCESSION    NO 


4.     TITLE  (and  Subtitle) 


Directions  in  Artificial  Intelligence: 
Natural  Language  Processing 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


3.  RECIPIENT'S  CAT  ALOG  NUMBER 


5.  TYPE  OF  REPORT  &  PERIOD  COVERED 

Symposium  Proceedings 
December  6,  1974 


6.  PERFORMING  ORG.  REPORT  NUMBER 


7.  AUTHORCsJ 

Ralph  Grishman  (editor) 


8.  CONTRACT  OR  GRANT  NUMBERfsJ 

ix(00014-67A-04  67-00  32 


9.     PERFORMING  ORGANIZATION    NAME    AMD    ADDRESS 

Courant  Inst.  Math.  Sci. 


10.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  6  WORK  UNIT  NUMBERS 


New  York  University 

251  Mercer  Street,  New  York,  N.Y   10012 


11.     CONTROLLING  OFFICE   NAME    AND    ADDRESS 

Dr.  Marvin  Denicoff 

Department  of  the  Navy  22217 

Office  of  Naval  Research,  Arlington  Va. 


12.     REPORT    DATE 

August    1975 


13.     NUMBER  OF    PAGES 

viii   +    107 


14.     MONITORING   AGENCY   NAME   ft    ADDRESSfl/  d//feren(  from   Controlling  Office) 


15.     SECURITY    CLASS,   (of  this  report) 

unclassified 


15a.     DECLASSIFICATION/ DOWNGRADING 
SCHEDULE 


16.      DISTRIBUTION    STATEMENT  (of  this  Report) 

Distribution  of  this  report  is  unlimited. 


17.      DISTRIBUTION    STATEMENT   (of  the  abstract  entered  in   Block  20,   if  different   from   Report) 


18.     SUPPLEMENTARY   NOTES 


19      KEY  WORDS  (Continue  on  reverse  side  it  necessary  and  Identity  by  block  number) 


artificial  intelligence,  natural  language,  syntax,  semantics, 
parsing,  information  retrieval 


20       ABSTRACT   (Continue  on  reverse  side  It  necessary  and  Identify  by  block  number) 

Proceedings  of  a  symposium  on  natural  language  processing  held  at 
the  Courant  Institute  of  Mathematical  Sciences,  iJew  York  Universit;' 
on  December  6,  1974.   The  talks  were  concerned  with  the  analysis 
of  the  structure  among  definitions  in  a  dictionary,  the  automatic 
generation  of  semantic  word  classes  by  text  analysis,  the  design 
of  semantic  hierarchies,  and  transformational  language  analysis 
procedures  and  underlying  structures  for  information  retrieval. 


DD     1    J  AN  ^73     1473  EDITION   OF    1    NOV  65  IS  OBSOLETE 

-109- 


UNCLASSIFIED 


SECURITY   CLASSIFICATION   OF    THIS   PAGE  (When  Data  Entered) 


SECURITY   CLASSIFICATION   OF   THIS  PAGEflVlien  DalB  hnler«d) 


-110- 


SECURITY  CLASSIFICATION  OF  THIS  P  I^OZ(Whan  Data  Entered) 


